This the multi-page printable view of this section. Click here to print.
Redis reference
- 1: ARM support
- 2: Redis client handling
- 3: Redis cluster specification
- 4: Debugging
- 5: Redis and the Gopher protocol
- 6: Redis internals
- 6.1: Event library
- 6.2: String internals
- 6.3: Virtual memory (deprecated)
- 6.4: Redis design draft #2 (historical)
- 7: Redis modules API
- 7.1: Modules API reference
- 7.2: Redis modules and blocking commands
- 7.3: Modules API for native types
- 8: RESP protocol spec
- 9: Redis signal handling
- 10: Sentinel client spec
- 11: Redis command arguments
- 12: Redis command tips
- 13: Optimizing Redis
- 13.1: Redis benchmark
- 13.2: Redis CPU profiling
- 13.3: Diagnosing latency issues
- 13.4: Redis latency monitoring
- 13.5: Memory optimization
- 14: Redis programming patterns
- 14.1: Bulk loading
- 14.2: Distributed Locks with Redis
- 14.3: Secondary indexing
- 14.4: Redis patterns example
- 15:
1 - ARM support
Redis versions 4.0 and above support the ARM processor in general, and the Raspberry Pi specifically, as a main platform. Every new release of Redis is tested on the Pi environment, and we update this documentation page with information about supported devices and other useful information. While Redis does run on Android, in the future we look forward to extend our testing efforts to Android to also make it an officially supported platform.
We believe that Redis is ideal for IoT and embedded devices for several reasons:
- Redis has a very small memory footprint and CPU requirements. It can run in small devices like the Raspberry Pi Zero without impacting the overall performance, using a small amount of memory while delivering good performance for many use cases.
- The data structures of Redis are often an ideal way to model IoT/embedded use cases. Some examples include accumulating time series data, receiving or queuing commands to execute or respond to send back to the remote servers, and so forth.
- Modeling data inside Redis can be very useful in order to make in-device decisions for appliances that must respond very quickly or when the remote servers are offline.
- Redis can be used as an communication system between the processes running in the device.
- The append-only file storage of Redis is well suited for SSD cards.
- The stream data structure included in Redis versions 5.0 and higher was specifically designed for time series applications and has a very low memory overhead.
Redis /proc/cpu/alignment requirements
Linux on ARM allows to trap unaligned accesses and fix them inside the kernel
in order to continue the execution of the offending program instead of
generating a SIGBUS
. Redis 4.0 and greater are fixed in order to avoid any kind
of unaligned access, so there is no need to have a specific value for this
kernel configuration. Even when kernel alignment fixing set as disabled Redis should
run as expected.
Building Redis in the Pi
- Download Redis version 4.0 or higher.
- Use
make
as usual to create the executable.
There is nothing special in the process. The only difference is that by
default, Redis uses the libc
allocator instead of defaulting to jemalloc
as it does in other Linux based environments. This is because we believe
that for the small use cases inside embedded devices, memory fragmentation
is unlikely to be a problem. Moreover jemalloc
on ARM may not be as tested
as the libc
allocator.
Performance
Performance testing of Redis was performed on the Raspberry Pi 3 and Pi 1 model B. The difference between the two Pis in terms of delivered performance is quite big. The benchmarks were performed via the loopback interface, since most use cases will probably use Redis from within the device and not via the network. The following numbers were obtained using Redis 4.0.
Raspberry Pi 3:
- Test 1 : 5 millions writes with 1 million keys (even distribution among keys). No persistence, no pipelining. 28,000 ops/sec.
- Test 2: Like test 1 but with pipelining using groups of 8 operations: 80,000 ops/sec.
- Test 3: Like test 1 but with AOF enabled, fsync 1 sec: 23,000 ops/sec
- Test 4: Like test 3, but with an AOF rewrite in progress: 21,000 ops/sec
Raspberry Pi 1 model B:
- Test 1 : 5 millions writes with 1 million keys (even distribution among keys). No persistence, no pipelining. 2,200 ops/sec.
- Test 2: Like test 1 but with pipelining using groups of 8 operations: 8,500 ops/sec.
- Test 3: Like test 1 but with AOF enabled, fsync 1 sec: 1,820 ops/sec
- Test 4: Like test 3, but with an AOF rewrite in progress: 1,000 ops/sec
The benchmarks above are referring to simple SET
/GET
operations. The performance is similar for all the Redis fast operations (not running in linear time). However sorted sets may show slightly slower numbers.
2 - Redis client handling
This document provides information about how Redis handles clients at the network layer level: connections, timeouts, buffers, and other similar topics are covered here.
The information contained in this document is only applicable to Redis version 2.6 or greater.
Accepting Client Connections
Redis accepts clients connections on the configured TCP port and on the Unix socket if enabled. When a new client connection is accepted the following operations are performed:
- The client socket is put in the non-blocking state since Redis uses multiplexing and non-blocking I/O.
- The
TCP_NODELAY
option is set in order to ensure that there are no delays to the connection. - A readable file event is created so that Redis is able to collect the client queries as soon as new data is available to read on the socket.
After the client is initialized, Redis checks if it is already at the limit
configured for the number of simultaneous clients (configured using the maxclients
configuration directive, see the next section of this document for further information).
When Redis can’t accept a new client connection because the maximum number of clients has been reached, it tries to send an error to the client in order to make it aware of this condition, closing the connection immediately. The error message will reach the client even if the connection is closed immediately by Redis because the new socket output buffer is usually big enough to contain the error, so the kernel will handle transmission of the error.
What Order are Client Requests Served In?
The order is determined by a combination of the client socket file descriptor number and order in which the kernel reports events, so the order should be considered as unspecified.
However, Redis does the following two things when serving clients:
- It only performs a single
read()
system call every time there is something new to read from the client socket. This ensures that if we have multiple clients connected, and a few send queries at a high rate, other clients are not penalized and will not experience latency issues. - However once new data is read from a client, all the queries contained in the current buffers are processed sequentially. This improves locality and does not need iterating a second time to see if there are clients that need some processing time.
Maximum Concurrent Connected Clients
In Redis 2.4 there was a hard-coded limit for the maximum number of clients that could be handled simultaneously.
In Redis 2.6 and newer, this limit is dynamic: by default it is set to 10000 clients, unless
otherwise stated by the maxclients
directive in redis.conf
.
However, Redis checks with the kernel what the maximum number of file descriptors that we are able to open is (the soft limit is checked). If the limit is less than the maximum number of clients we want to handle, plus 32 (that is the number of file descriptors Redis reserves for internal uses), then the maximum number of clients is updated to match the number of clients it is really able to handle under the current operating system limit.
When maxclients
is set to a number greater than Redis can support, a message is logged at startup:
$ ./redis-server --maxclients 100000
[41422] 23 Jan 11:28:33.179 # Unable to set the max number of files limit to 100032 (Invalid argument), setting the max clients configuration to 10112.
When Redis is configured in order to handle a specific number of clients it is a good idea to make sure that the operating system limit for the maximum number of file descriptors per process is also set accordingly.
Under Linux these limits can be set both in the current session and as a system-wide setting with the following commands:
ulimit -Sn 100000 # This will only work if hard limit is big enough.
sysctl -w fs.file-max=100000
Output Buffer Limits
Redis needs to handle a variable-length output buffer for every client, since a command can produce a large amount of data that needs to be transferred to the client.
However it is possible that a client sends more commands producing more output to serve at a faster rate than that which Redis can send the existing output to the client. This is especially true with Pub/Sub clients in case a client is not able to process new messages fast enough.
Both conditions will cause the client output buffer to grow and consume more and more memory. For this reason by default Redis sets limits to the output buffer size for different kind of clients. When the limit is reached the client connection is closed and the event logged in the Redis log file.
There are two kind of limits Redis uses:
- The hard limit is a fixed limit that when reached will make Redis close the client connection as soon as possible.
- The soft limit instead is a limit that depends on the time, for instance a soft limit of 32 megabytes per 10 seconds means that if the client has an output buffer bigger than 32 megabytes for, continuously, 10 seconds, the connection gets closed.
Different kind of clients have different default limits:
- Normal clients have a default limit of 0, that means, no limit at all, because most normal clients use blocking implementations sending a single command and waiting for the reply to be completely read before sending the next command, so it is always not desirable to close the connection in case of a normal client.
- Pub/Sub clients have a default hard limit of 32 megabytes and a soft limit of 8 megabytes per 60 seconds.
- Replicas have a default hard limit of 256 megabytes and a soft limit of 64 megabyte per 60 seconds.
It is possible to change the limit at runtime using the CONFIG SET
command or in a permanent way using the Redis configuration file redis.conf
. See the example redis.conf
in the Redis distribution for more information about how to set the limit.
Query Buffer Hard Limit
Every client is also subject to a query buffer limit. This is a non-configurable hard limit that will close the connection when the client query buffer (that is the buffer we use to accumulate commands from the client) reaches 1 GB, and is actually only an extreme limit to avoid a server crash in case of client or server software bugs.
Client Eviction
Redis is built to handle a very large number of client connections. Client connections tend to consume memory, and when there are many of them, the aggregate memory consumption can be extremely high, leading to data eviction or out-of-memory errors. These cases can be mitigated to an extent using output buffer limits, but Redis allows us a more robust configuration to limit the aggregate memory used by all clients' connections.
This mechanism is called client eviction, and it’s essentially a safety mechanism that will disconnect clients once the aggregate memory usage of all clients is above a threshold.
The mechanism first attempts to disconnect clients that use the most memory.
It disconnects the minimal number of clients needed to return below the maxmemory-clients
threshold.
maxmemory-clients
defines the maximum aggregate memory usage of all clients connected to Redis.
The aggregation takes into account all the memory used by the client connections: the query buffer, the output buffer, and other intermediate buffers.
Note that replica and master connections aren’t affected by the client eviction mechanism. Therefore, such connections are never evicted.
maxmemory-clients
can be set permanently in the configuration file (redis.conf
) or via the CONFIG SET
command.
This setting can either be 0 (meaning no limit), a size in bytes (possibly with mb
/gb
suffix),
or a percentage of maxmemory
by using the %
suffix (e.g. setting it to 10%
would mean 10% of the maxmemory
configuration).
The default setting is 0, meaning client eviction is turned off by default.
However, for any large production deployment, it is highly recommended to configure some non-zero maxmemory-clients
value.
A value 5%
, for example, can be a good place to start.
It is possible to flag a specific client connection to be excluded from the client eviction mechanism.
This is useful for control path connections.
If, for example, you have an application that monitors the server via the INFO
command and alerts you in case of a problem, you might want to make sure this connection isn’t evicted.
You can do so using the following command (from the relevant client’s connection):
CLIENT NO-EVICT
on
And you can revert that with:
CLIENT NO-EVICT
off
For more information and an example refer to the maxmemory-clients
section in the default redis.conf
file.
Client eviction is available from Redis 7.0.
Client Timeouts
By default recent versions of Redis don’t close the connection with the client if the client is idle for many seconds: the connection will remain open forever.
However if you don’t like this behavior, you can configure a timeout, so that if the client is idle for more than the specified number of seconds, the client connection will be closed.
You can configure this limit via redis.conf
or simply using CONFIG SET timeout <value>
.
Note that the timeout only applies to normal clients and it does not apply to Pub/Sub clients, since a Pub/Sub connection is a push style connection so a client that is idle is the norm.
Even if by default connections are not subject to timeout, there are two conditions when it makes sense to set a timeout:
- Mission critical applications where a bug in the client software may saturate the Redis server with idle connections, causing service disruption.
- As a debugging mechanism in order to be able to connect with the server if a bug in the client software saturates the server with idle connections, making it impossible to interact with the server.
Timeouts are not to be considered very precise: Redis avoids setting timer events or running O(N) algorithms in order to check idle clients, so the check is performed incrementally from time to time. This means that it is possible that while the timeout is set to 10 seconds, the client connection will be closed, for instance, after 12 seconds if many clients are connected at the same time.
The CLIENT Command
The Redis CLIENT
command allows you to inspect the state of every connected client, to kill a specific client, and to name connections. It is a very powerful debugging tool if you use Redis at scale.
CLIENT LIST
is used in order to obtain a list of connected clients and their state:
redis 127.0.0.1:6379> client list
addr=127.0.0.1:52555 fd=5 name= age=855 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client
addr=127.0.0.1:52787 fd=6 name= age=6 idle=5 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping
In the above example two clients are connected to the Redis server. Let’s look at what some of the data returned represents:
- addr: The client address, that is, the client IP and the remote port number it used to connect with the Redis server.
- fd: The client socket file descriptor number.
- name: The client name as set by
CLIENT SETNAME
. - age: The number of seconds the connection existed for.
- idle: The number of seconds the connection is idle.
- flags: The kind of client (N means normal client, check the full list of flags).
- omem: The amount of memory used by the client for the output buffer.
- cmd: The last executed command.
See the CLIENT LIST
documentation for the full listing of fields and their purpose.
Once you have the list of clients, you can close a client’s connection using the CLIENT KILL
command, specifying the client address as its argument.
The commands CLIENT SETNAME
and CLIENT GETNAME
can be used to set and get the connection name. Starting with Redis 4.0, the client name is shown in the
SLOWLOG
output, to help identify clients that create latency issues.
TCP keepalive
From version 3.2 onwards, Redis has TCP keepalive (SO_KEEPALIVE
socket option) enabled by default and set to about 300 seconds. This option is useful in order to detect dead peers (clients that cannot be reached even if they look connected). Moreover, if there is network equipment between clients and servers that need to see some traffic in order to take the connection open, the option will prevent unexpected connection closed events.
3 - Redis cluster specification
Welcome to the Redis Cluster Specification. Here you’ll find information about the algorithms and design rationales of Redis Cluster. This document is a work in progress as it is continuously synchronized with the actual implementation of Redis.
Main properties and rationales of the design
Redis Cluster goals
Redis Cluster is a distributed implementation of Redis with the following goals in order of importance in the design:
- High performance and linear scalability up to 1000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.
- Acceptable degree of write safety: the system tries (in a best-effort way) to retain all the writes originating from clients connected with the majority of the master nodes. Usually there are small windows where acknowledged writes can be lost. Windows to lose acknowledged writes are larger when clients are in a minority partition.
- Availability: Redis Cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable replica for every master node that is no longer reachable. Moreover using replicas migration, masters no longer replicated by any replica will receive one from a master which is covered by multiple replicas.
What is described in this document is implemented in Redis 3.0 or greater.
Implemented subset
Redis Cluster implements all the single key commands available in the non-distributed version of Redis. Commands performing complex multi-key operations like set unions and intersections are implemented for cases where all of the keys involved in the operation hash to the same slot.
Redis Cluster implements a concept called hash tags that can be used to force certain keys to be stored in the same hash slot. However, during manual resharding, multi-key operations may become unavailable for some time while single-key operations are always available.
Redis Cluster does not support multiple databases like the standalone version
of Redis. We only support database 0
; the SELECT
command is not allowed.
Client and Server roles in the Redis cluster protocol
In Redis Cluster, nodes are responsible for holding the data, and taking the state of the cluster, including mapping keys to the right nodes. Cluster nodes are also able to auto-discover other nodes, detect non-working nodes, and promote replica nodes to master when needed in order to continue to operate when a failure occurs.
To perform their tasks all the cluster nodes are connected using a TCP bus and a binary protocol, called the Redis Cluster Bus. Every node is connected to every other node in the cluster using the cluster bus. Nodes use a gossip protocol to propagate information about the cluster in order to discover new nodes, to send ping packets to make sure all the other nodes are working properly, and to send cluster messages needed to signal specific conditions. The cluster bus is also used in order to propagate Pub/Sub messages across the cluster and to orchestrate manual failovers when requested by users (manual failovers are failovers which are not initiated by the Redis Cluster failure detector, but by the system administrator directly).
Since cluster nodes are not able to proxy requests, clients may be redirected
to other nodes using redirection errors -MOVED
and -ASK
.
The client is in theory free to send requests to all the nodes in the cluster,
getting redirected if needed, so the client is not required to hold the
state of the cluster. However clients that are able to cache the map between
keys and nodes can improve the performance in a sensible way.
Write safety
Redis Cluster uses asynchronous replication between nodes, and last failover wins implicit merge function. This means that the last elected master dataset eventually replaces all the other replicas. There is always a window of time when it is possible to lose writes during partitions. However these windows are very different in the case of a client that is connected to the majority of masters, and a client that is connected to the minority of masters.
Redis Cluster tries harder to retain writes that are performed by clients connected to the majority of masters, compared to writes performed in the minority side. The following are examples of scenarios that lead to loss of acknowledged writes received in the majority partitions during failures:
-
A write may reach a master, but while the master may be able to reply to the client, the write may not be propagated to replicas via the asynchronous replication used between master and replica nodes. If the master dies without the write reaching the replicas, the write is lost forever if the master is unreachable for a long enough period that one of its replicas is promoted. This is usually hard to observe in the case of a total, sudden failure of a master node since masters try to reply to clients (with the acknowledge of the write) and replicas (propagating the write) at about the same time. However it is a real world failure mode.
-
Another theoretically possible failure mode where writes are lost is the following:
- A master is unreachable because of a partition.
- It gets failed over by one of its replicas.
- After some time it may be reachable again.
- A client with an out-of-date routing table may write to the old master before it is converted into a replica (of the new master) by the cluster.
The second failure mode is unlikely to happen because master nodes unable to communicate with the majority of the other masters for enough time to be failed over will no longer accept writes, and when the partition is fixed writes are still refused for a small amount of time to allow other nodes to inform about configuration changes. This failure mode also requires that the client’s routing table has not yet been updated.
Writes targeting the minority side of a partition have a larger window in which to get lost. For example, Redis Cluster loses a non-trivial number of writes on partitions where there is a minority of masters and at least one or more clients, since all the writes sent to the masters may potentially get lost if the masters are failed over in the majority side.
Specifically, for a master to be failed over it must be unreachable by the majority of masters for at least NODE_TIMEOUT
, so if the partition is fixed before that time, no writes are lost. When the partition lasts for more than NODE_TIMEOUT
, all the writes performed in the minority side up to that point may be lost. However the minority side of a Redis Cluster will start refusing writes as soon as NODE_TIMEOUT
time has elapsed without contact with the majority, so there is a maximum window after which the minority becomes no longer available. Hence, no writes are accepted or lost after that time.
Availability
Redis Cluster is not available in the minority side of the partition. In the majority side of the partition assuming that there are at least the majority of masters and a replica for every unreachable master, the cluster becomes available again after NODE_TIMEOUT
time plus a few more seconds required for a replica to get elected and failover its master (failovers are usually executed in a matter of 1 or 2 seconds).
This means that Redis Cluster is designed to survive failures of a few nodes in the cluster, but it is not a suitable solution for applications that require availability in the event of large net splits.
In the example of a cluster composed of N master nodes where every node has a single replica, the majority side of the cluster will remain available as long as a single node is partitioned away, and will remain available with a probability of 1-(1/(N*2-1))
when two nodes are partitioned away (after the first node fails we are left with N*2-1
nodes in total, and the probability of the only master without a replica to fail is 1/(N*2-1))
.
For example, in a cluster with 5 nodes and a single replica per node, there is a 1/(5*2-1) = 11.11%
probability that after two nodes are partitioned away from the majority, the cluster will no longer be available.
Thanks to a Redis Cluster feature called replicas migration the Cluster availability is improved in many real world scenarios by the fact that replicas migrate to orphaned masters (masters no longer having replicas). So at every successful failure event, the cluster may reconfigure the replicas layout in order to better resist the next failure.
Performance
In Redis Cluster nodes don’t proxy commands to the right node in charge for a given key, but instead they redirect clients to the right nodes serving a given portion of the key space.
Eventually clients obtain an up-to-date representation of the cluster and which node serves which subset of keys, so during normal operations clients directly contact the right nodes in order to send a given command.
Because of the use of asynchronous replication, nodes do not wait for other nodes' acknowledgment of writes (if not explicitly requested using the WAIT
command).
Also, because multi-key commands are only limited to near keys, data is never moved between nodes except when resharding.
Normal operations are handled exactly as in the case of a single Redis instance. This means that in a Redis Cluster with N master nodes you can expect the same performance as a single Redis instance multiplied by N as the design scales linearly. At the same time the query is usually performed in a single round trip, since clients usually retain persistent connections with the nodes, so latency figures are also the same as the single standalone Redis node case.
Very high performance and scalability while preserving weak but reasonable forms of data safety and availability is the main goal of Redis Cluster.
Why merge operations are avoided
The Redis Cluster design avoids conflicting versions of the same key-value pair in multiple nodes as in the case of the Redis data model this is not always desirable. Values in Redis are often very large; it is common to see lists or sorted sets with millions of elements. Also data types are semantically complex. Transferring and merging these kind of values can be a major bottleneck and/or may require the non-trivial involvement of application-side logic, additional memory to store meta-data, and so forth.
There are no strict technological limits here. CRDTs or synchronously replicated state machines can model complex data types similar to Redis. However, the actual run time behavior of such systems would not be similar to Redis Cluster. Redis Cluster was designed in order to cover the exact use cases of the non-clustered Redis version.
Overview of Redis Cluster main components
Key distribution model
The cluster’s key space is split into 16384 slots, effectively setting an upper limit for the cluster size of 16384 master nodes (however, the suggested max size of nodes is on the order of ~ 1000 nodes).
Each master node in a cluster handles a subset of the 16384 hash slots. The cluster is stable when there is no cluster reconfiguration in progress (i.e. where hash slots are being moved from one node to another). When the cluster is stable, a single hash slot will be served by a single node (however the serving node can have one or more replicas that will replace it in the case of net splits or failures, and that can be used in order to scale read operations where reading stale data is acceptable).
The base algorithm used to map keys to hash slots is the following (read the next paragraph for the hash tag exception to this rule):
HASH_SLOT = CRC16(key) mod 16384
The CRC16 is specified as follows:
- Name: XMODEM (also known as ZMODEM or CRC-16/ACORN)
- Width: 16 bit
- Poly: 1021 (That is actually x^16 + x^12 + x^5 + 1)
- Initialization: 0000
- Reflect Input byte: False
- Reflect Output CRC: False
- Xor constant to output CRC: 0000
- Output for “123456789”: 31C3
14 out of 16 CRC16 output bits are used (this is why there is a modulo 16384 operation in the formula above).
In our tests CRC16 behaved remarkably well in distributing different kinds of keys evenly across the 16384 slots.
Note: A reference implementation of the CRC16 algorithm used is available in the Appendix A of this document.
Hash tags
There is an exception for the computation of the hash slot that is used in order to implement hash tags. Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.
To implement hash tags, the hash slot for a key is computed in a
slightly different way in certain conditions.
If the key contains a “{…}” pattern only the substring between
{
and }
is hashed in order to obtain the hash slot. However since it is
possible that there are multiple occurrences of {
or }
the algorithm is
well specified by the following rules:
- IF the key contains a
{
character. - AND IF there is a
}
character to the right of{
. - AND IF there are one or more characters between the first occurrence of
{
and the first occurrence of}
.
Then instead of hashing the key, only what is between the first occurrence of {
and the following first occurrence of }
is hashed.
Examples:
- The two keys
{user1000}.following
and{user1000}.followers
will hash to the same hash slot since only the substringuser1000
will be hashed in order to compute the hash slot. - For the key
foo{}{bar}
the whole key will be hashed as usually since the first occurrence of{
is followed by}
on the right without characters in the middle. - For the key
foo{{bar}}zap
the substring{bar
will be hashed, because it is the substring between the first occurrence of{
and the first occurrence of}
on its right. - For the key
foo{bar}{zap}
the substringbar
will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of{
and}
. - What follows from the algorithm is that if the key starts with
{}
, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.
Adding the hash tags exception, the following is an implementation of the HASH_SLOT
function in Ruby and C language.
Ruby example code:
def HASH_SLOT(key)
s = key.index "{"
if s
e = key.index "}",s+1
if e && e != s+1
key = key[s+1..e-1]
end
end
crc16(key) % 16384
end
C example code:
unsigned int HASH_SLOT(char *key, int keylen) {
int s, e; /* start-end indexes of { and } */
/* Search the first occurrence of '{'. */
for (s = 0; s < keylen; s++)
if (key[s] == '{') break;
/* No '{' ? Hash the whole key. This is the base case. */
if (s == keylen) return crc16(key,keylen) & 16383;
/* '{' found? Check if we have the corresponding '}'. */
for (e = s+1; e < keylen; e++)
if (key[e] == '}') break;
/* No '}' or nothing between {} ? Hash the whole key. */
if (e == keylen || e == s+1) return crc16(key,keylen) & 16383;
/* If we are here there is both a { and a } on its right. Hash
* what is in the middle between { and }. */
return crc16(key+s+1,e-s-1) & 16383;
}
Cluster node attributes
Every node has a unique name in the cluster. The node name is the
hex representation of a 160 bit random number, obtained the first time a
node is started (usually using /dev/urandom).
The node will save its ID in the node configuration file, and will use the
same ID forever, or at least as long as the node configuration file is not
deleted by the system administrator, or a hard reset is requested
via the CLUSTER RESET
command.
The node ID is used to identify every node across the whole cluster. It is possible for a given node to change its IP address without any need to also change the node ID. The cluster is also able to detect the change in IP/port and reconfigure using the gossip protocol running over the cluster bus.
The node ID is not the only information associated with each node, but is the only one that is always globally consistent. Every node has also the following set of information associated. Some information is about the cluster configuration detail of this specific node, and is eventually consistent across the cluster. Some other information, like the last time a node was pinged, is instead local to each node.
Every node maintains the following information about other nodes that it is
aware of in the cluster: The node ID, IP and port of the node, a set of
flags, what is the master of the node if it is flagged as replica
, last time
the node was pinged and the last time the pong was received, the current
configuration epoch of the node (explained later in this specification),
the link state and finally the set of hash slots served.
A detailed explanation of all the node fields is described in the CLUSTER NODES
documentation.
The CLUSTER NODES
command can be sent to any node in the cluster and provides the state of the cluster and the information for each node according to the local view the queried node has of the cluster.
The following is sample output of the CLUSTER NODES
command sent to a master
node in a small cluster of three nodes.
$ redis-cli cluster nodes
d1861060fe6a534d42d8a19aeb36600e18785e04 127.0.0.1:6379 myself - 0 1318428930 1 connected 0-1364
3886e65cc906bfd9b1f7e7bde468726a052d1dae 127.0.0.1:6380 master - 1318428930 1318428931 2 connected 1365-2729
d289c575dcbc4bdd2931585fd4339089e461a27d 127.0.0.1:6381 master - 1318428931 1318428931 3 connected 2730-4095
In the above listing the different fields are in order: node id, address:port, flags, last ping sent, last pong received, configuration epoch, link state, slots. Details about the above fields will be covered as soon as we talk of specific parts of Redis Cluster.
The cluster bus
Every Redis Cluster node has an additional TCP port for receiving incoming connections from other Redis Cluster nodes. This port will be derived by adding 10000 to the data port or it can be specified with the cluster-port config.
Example 1:
If a Redis node is listening for client connections on port 6379, and you do not add cluster-port parameter in redis.conf, the Cluster bus port 16379 will be opened.
Example 2:
If a Redis node is listening for client connections on port 6379, and you set cluster-port 20000 in redis.conf, the Cluster bus port 20000 will be opened.
Node-to-node communication happens exclusively using the Cluster bus and
the Cluster bus protocol: a binary protocol composed of frames
of different types and sizes. The Cluster bus binary protocol is not
publicly documented since it is not intended for external software devices
to talk with Redis Cluster nodes using this protocol. However you can
obtain more details about the Cluster bus protocol by reading the
cluster.h
and cluster.c
files in the Redis Cluster source code.
Cluster topology
Redis Cluster is a full mesh where every node is connected with every other node using a TCP connection.
In a cluster of N nodes, every node has N-1 outgoing TCP connections, and N-1 incoming connections.
These TCP connections are kept alive all the time and are not created on demand. When a node expects a pong reply in response to a ping in the cluster bus, before waiting long enough to mark the node as unreachable, it will try to refresh the connection with the node by reconnecting from scratch.
While Redis Cluster nodes form a full mesh, nodes use a gossip protocol and a configuration update mechanism in order to avoid exchanging too many messages between nodes during normal conditions, so the number of messages exchanged is not exponential.
Node handshake
Nodes always accept connections on the cluster bus port, and even reply to pings when received, even if the pinging node is not trusted. However, all other packets will be discarded by the receiving node if the sending node is not considered part of the cluster.
A node will accept another node as part of the cluster only in two ways:
-
If a node presents itself with a
MEET
message (CLUSTER MEET
command). A meet message is exactly like aPING
message, but forces the receiver to accept the node as part of the cluster. Nodes will sendMEET
messages to other nodes only if the system administrator requests this via the following command:CLUSTER MEET ip port
-
A node will also register another node as part of the cluster if a node that is already trusted will gossip about this other node. So if A knows B, and B knows C, eventually B will send gossip messages to A about C. When this happens, A will register C as part of the network, and will try to connect with C.
This means that as long as we join nodes in any connected graph, they’ll eventually form a fully connected graph automatically. This means that the cluster is able to auto-discover other nodes, but only if there is a trusted relationship that was forced by the system administrator.
This mechanism makes the cluster more robust but prevents different Redis clusters from accidentally mixing after change of IP addresses or other network related events.
Redirection and resharding
MOVED Redirection
A Redis client is free to send queries to every node in the cluster, including replica nodes. The node will analyze the query, and if it is acceptable (that is, only a single key is mentioned in the query, or the multiple keys mentioned are all to the same hash slot) it will lookup what node is responsible for the hash slot where the key or keys belong.
If the hash slot is served by the node, the query is simply processed, otherwise the node will check its internal hash slot to node map, and will reply to the client with a MOVED error, like in the following example:
GET x
-MOVED 3999 127.0.0.1:6381
The error includes the hash slot of the key (3999) and the endpoint:port of the instance that can serve the query.
The client needs to reissue the query to the specified node’s endpoint address and port.
The endpoint can be either an IP address, a hostname, or it can be empty (e.g. -MOVED 3999 :6380
).
An empty endpoint indicates that the server node has an an unknown endpoint, and the client should send the next request to the same endpoint as the current request but with the provided port.
Note that even if the client waits a long time before reissuing the query, and in the meantime the cluster configuration changed, the destination node will reply again with a MOVED error if the hash slot 3999 is now served by another node. The same happens if the contacted node had no updated information.
So while from the point of view of the cluster nodes are identified by IDs we try to simplify our interface with the client just exposing a map between hash slots and Redis nodes identified by endpoint:port pairs.
The client is not required to, but should try to memorize that hash slot 3999 is served by 127.0.0.1:6381. This way once a new command needs to be issued it can compute the hash slot of the target key and have a greater chance of choosing the right node.
An alternative is to just refresh the whole client-side cluster layout
using the CLUSTER SLOTS
commands
when a MOVED redirection is received. When a redirection is encountered, it
is likely multiple slots were reconfigured rather than just one, so updating
the client configuration as soon as possible is often the best strategy.
Note that when the Cluster is stable (no ongoing changes in the configuration), eventually all the clients will obtain a map of hash slots -> nodes, making the cluster efficient, with clients directly addressing the right nodes without redirections, proxies or other single point of failure entities.
A client must be also able to handle -ASK redirections that are described later in this document, otherwise it is not a complete Redis Cluster client.
Live reconfiguration
Redis Cluster supports the ability to add and remove nodes while the cluster is running. Adding or removing a node is abstracted into the same operation: moving a hash slot from one node to another. This means that the same basic mechanism can be used in order to rebalance the cluster, add or remove nodes, and so forth.
- To add a new node to the cluster an empty node is added to the cluster and some set of hash slots are moved from existing nodes to the new node.
- To remove a node from the cluster the hash slots assigned to that node are moved to other existing nodes.
- To rebalance the cluster a given set of hash slots are moved between nodes.
The core of the implementation is the ability to move hash slots around. From a practical point of view a hash slot is just a set of keys, so what Redis Cluster really does during resharding is to move keys from an instance to another instance. Moving a hash slot means moving all the keys that happen to hash into this hash slot.
To understand how this works we need to show the CLUSTER
subcommands
that are used to manipulate the slots translation table in a Redis Cluster node.
The following subcommands are available (among others not useful in this case):
CLUSTER ADDSLOTS
slot1 [slot2] … [slotN]CLUSTER DELSLOTS
slot1 [slot2] … [slotN]CLUSTER ADDSLOTSRANGE
start-slot1 end-slot1 [start-slot2 end-slot2] … [start-slotN end-slotN]CLUSTER DELSLOTSRANGE
start-slot1 end-slot1 [start-slot2 end-slot2] … [start-slotN end-slotN]CLUSTER SETSLOT
slot NODE nodeCLUSTER SETSLOT
slot MIGRATING nodeCLUSTER SETSLOT
slot IMPORTING node
The first four commands, ADDSLOTS
, DELSLOTS
, ADDSLOTSRANGE
and DELSLOTSRANGE
, are simply used to assign
(or remove) slots to a Redis node. Assigning a slot means to tell a given
master node that it will be in charge of storing and serving content for
the specified hash slot.
After the hash slots are assigned they will propagate across the cluster using the gossip protocol, as specified later in the configuration propagation section.
The ADDSLOTS
and ADDSLOTSRANGE
commands are usually used when a new cluster is created
from scratch to assign each master node a subset of all the 16384 hash
slots available.
The DELSLOTS
and DELSLOTSRANGE
are mainly used for manual modification of a cluster configuration
or for debugging tasks: in practice it is rarely used.
The SETSLOT
subcommand is used to assign a slot to a specific node ID if
the SETSLOT <slot> NODE
form is used. Otherwise the slot can be set in the
two special states MIGRATING
and IMPORTING
. Those two special states
are used in order to migrate a hash slot from one node to another.
- When a slot is set as MIGRATING, the node will accept all queries that
are about this hash slot, but only if the key in question
exists, otherwise the query is forwarded using a
-ASK
redirection to the node that is target of the migration. - When a slot is set as IMPORTING, the node will accept all queries that
are about this hash slot, but only if the request is
preceded by an
ASKING
command. If theASKING
command was not given by the client, the query is redirected to the real hash slot owner via a-MOVED
redirection error, as would happen normally.
Let’s make this clearer with an example of hash slot migration. Assume that we have two Redis master nodes, called A and B. We want to move hash slot 8 from A to B, so we issue commands like this:
- We send B: CLUSTER SETSLOT 8 IMPORTING A
- We send A: CLUSTER SETSLOT 8 MIGRATING B
All the other nodes will continue to point clients to node “A” every time they are queried with a key that belongs to hash slot 8, so what happens is that:
- All queries about existing keys are processed by “A”.
- All queries about non-existing keys in A are processed by “B”, because “A” will redirect clients to “B”.
This way we no longer create new keys in “A”.
In the meantime, redis-cli
used during reshardings
and Redis Cluster configuration will migrate existing keys in
hash slot 8 from A to B.
This is performed using the following command:
CLUSTER GETKEYSINSLOT slot count
The above command will return count
keys in the specified hash slot.
For keys returned, redis-cli
sends node “A” a MIGRATE
command, that
will migrate the specified keys from A to B in an atomic way (both instances
are locked for the time (usually very small time) needed to migrate keys so
there are no race conditions). This is how MIGRATE
works:
MIGRATE target_host target_port "" target_database id timeout KEYS key1 key2 ...
MIGRATE
will connect to the target instance, send a serialized version of
the key, and once an OK code is received, the old key from its own dataset
will be deleted. From the point of view of an external client a key exists
either in A or B at any given time.
In Redis Cluster there is no need to specify a database other than 0, but
MIGRATE
is a general command that can be used for other tasks not
involving Redis Cluster.
MIGRATE
is optimized to be as fast as possible even when moving complex
keys such as long lists, but in Redis Cluster reconfiguring the
cluster where big keys are present is not considered a wise procedure if
there are latency constraints in the application using the database.
When the migration process is finally finished, the SETSLOT <slot> NODE <node-id>
command is sent to the two nodes involved in the migration in order to
set the slots to their normal state again. The same command is usually
sent to all other nodes to avoid waiting for the natural
propagation of the new configuration across the cluster.
ASK redirection
In the previous section, we briefly talked about ASK redirection. Why can’t we simply use MOVED redirection? Because while MOVED means that we think the hash slot is permanently served by a different node and the next queries should be tried against the specified node. ASK means to send only the next query to the specified node.
This is needed because the next query about hash slot 8 can be about a key that is still in A, so we always want the client to try A and then B if needed. Since this happens only for one hash slot out of 16384 available, the performance hit on the cluster is acceptable.
We need to force that client behavior, so to make sure that clients will only try node B after A was tried, node B will only accept queries of a slot that is set as IMPORTING if the client sends the ASKING command before sending the query.
Basically the ASKING command sets a one-time flag on the client that forces a node to serve a query about an IMPORTING slot.
The full semantics of ASK redirection from the point of view of the client is as follows:
- If ASK redirection is received, send only the query that was redirected to the specified node but continue sending subsequent queries to the old node.
- Start the redirected query with the ASKING command.
- Don’t yet update local client tables to map hash slot 8 to B.
Once hash slot 8 migration is completed, A will send a MOVED message and the client may permanently map hash slot 8 to the new endpoint and port pair. Note that if a buggy client performs the map earlier this is not a problem since it will not send the ASKING command before issuing the query, so B will redirect the client to A using a MOVED redirection error.
Slots migration is explained in similar terms but with different wording
(for the sake of redundancy in the documentation) in the CLUSTER SETSLOT
command documentation.
Client connections and redirection handling
To be efficient, Redis Cluster clients maintain a map of the current slot configuration. However, this configuration is not required to be up to date. When contacting the wrong node results in a redirection, the client can update its internal slot map accordingly.
Clients usually need to fetch a complete list of slots and mapped node addresses in two different situations:
- At startup, to populate the initial slots configuration
- When the client receives a
MOVED
redirection
Note that a client may handle the MOVED
redirection by updating just the
moved slot in its table; however this is usually not efficient because often
the configuration of multiple slots will be modified at once. For example, if a
replica is promoted to master, all of the slots served by the old master will
be remapped). It is much simpler to react to a MOVED
redirection by
fetching the full map of slots to nodes from scratch.
Client can issue a CLUSTER SLOTS
command to retrieve an array of slot
ranges and the associated master and replica nodes serving the specified ranges.
The following is an example of output of CLUSTER SLOTS
:
127.0.0.1:7000> cluster slots
1) 1) (integer) 5461
2) (integer) 10922
3) 1) "127.0.0.1"
2) (integer) 7001
4) 1) "127.0.0.1"
2) (integer) 7004
2) 1) (integer) 0
2) (integer) 5460
3) 1) "127.0.0.1"
2) (integer) 7000
4) 1) "127.0.0.1"
2) (integer) 7003
3) 1) (integer) 10923
2) (integer) 16383
3) 1) "127.0.0.1"
2) (integer) 7002
4) 1) "127.0.0.1"
2) (integer) 7005
The first two sub-elements of every element of the returned array are the start and end slots of the range. The additional elements represent address-port pairs. The first address-port pair is the master serving the slot, and the additional address-port pairs are the replicas serving the same slot. Replicas will be listed only when not in an error condition (i.e., when their FAIL flag is not set).
The first element in the output above says that slots from 5461 to 10922 (start and end included) are served by 127.0.0.1:7001, and it is possible to scale read-only load contacting the replica at 127.0.0.1:7004.
CLUSTER SLOTS
is not guaranteed to return ranges that cover the full
16384 slots if the cluster is misconfigured, so clients should initialize the
slots configuration map filling the target nodes with NULL objects, and
report an error if the user tries to execute commands about keys
that belong to unassigned slots.
Before returning an error to the caller when a slot is found to be unassigned, the client should try to fetch the slots configuration again to check if the cluster is now configured properly.
Multi-keys operations
Using hash tags, clients are free to use multi-key operations. For example the following operation is valid:
MSET {user:1000}.name Angela {user:1000}.surname White
Multi-key operations may become unavailable when a resharding of the hash slot the keys belong to is in progress.
More specifically, even during a resharding the multi-key operations targeting keys that all exist and all still hash to the same slot (either the source or destination node) are still available.
Operations on keys that don’t exist or are - during the resharding - split
between the source and destination nodes, will generate a -TRYAGAIN
error.
The client can try the operation after some time, or report back the error.
As soon as migration of the specified hash slot has terminated, all multi-key operations are available again for that hash slot.
Scaling reads using replica nodes
Normally replica nodes will redirect clients to the authoritative master for
the hash slot involved in a given command, however clients can use replicas
in order to scale reads using the READONLY
command.
READONLY
tells a Redis Cluster replica node that the client is ok reading
possibly stale data and is not interested in running write queries.
When the connection is in readonly mode, the cluster will send a redirection to the client only if the operation involves keys not served by the replica’s master node. This may happen because:
- The client sent a command about hash slots never served by the master of this replica.
- The cluster was reconfigured (for example resharded) and the replica is no longer able to serve commands for a given hash slot.
When this happens the client should update its hash slot map as explained in the previous sections.
The readonly state of the connection can be cleared using the READWRITE
command.
Fault Tolerance
Heartbeat and gossip messages
Redis Cluster nodes continuously exchange ping and pong packets. Those two kinds of packets have the same structure, and both carry important configuration information. The only actual difference is the message type field. We’ll refer to the sum of ping and pong packets as heartbeat packets.
Usually nodes send ping packets that will trigger the receivers to reply with pong packets. However this is not necessarily true. It is possible for nodes to just send pong packets to send information to other nodes about their configuration, without triggering a reply. This is useful, for example, in order to broadcast a new configuration as soon as possible.
Usually a node will ping a few random nodes every second so that the total number of ping packets sent (and pong packets received) by each node is a constant amount regardless of the number of nodes in the cluster.
However every node makes sure to ping every other node that hasn’t sent a ping or received a pong for longer than half the NODE_TIMEOUT
time. Before NODE_TIMEOUT
has elapsed, nodes also try to reconnect the TCP link with another node to make sure nodes are not believed to be unreachable only because there is a problem in the current TCP connection.
The number of messages globally exchanged can be sizable if NODE_TIMEOUT
is set to a small figure and the number of nodes (N) is very large, since every node will try to ping every other node for which they don’t have fresh information every half the NODE_TIMEOUT
time.
For example in a 100 node cluster with a node timeout set to 60 seconds, every node will try to send 99 pings every 30 seconds, with a total amount of pings of 3.3 per second. Multiplied by 100 nodes, this is 330 pings per second in the total cluster.
There are ways to lower the number of messages, however there have been no reported issues with the bandwidth currently used by Redis Cluster failure detection, so for now the obvious and direct design is used. Note that even in the above example, the 330 packets per second exchanged are evenly divided among 100 different nodes, so the traffic each node receives is acceptable.
Heartbeat packet content
Ping and pong packets contain a header that is common to all types of packets (for instance packets to request a failover vote), and a special gossip section that is specific to Ping and Pong packets.
The common header has the following information:
- Node ID, a 160 bit pseudorandom string that is assigned the first time a node is created and remains the same for all the life of a Redis Cluster node.
- The
currentEpoch
andconfigEpoch
fields of the sending node that are used to mount the distributed algorithms used by Redis Cluster (this is explained in detail in the next sections). If the node is a replica theconfigEpoch
is the last knownconfigEpoch
of its master. - The node flags, indicating if the node is a replica, a master, and other single-bit node information.
- A bitmap of the hash slots served by the sending node, or if the node is a replica, a bitmap of the slots served by its master.
- The sender TCP base port that is the port used by Redis to accept client commands.
- The cluster port that is the port used by Redis for node-to-node communication.
- The state of the cluster from the point of view of the sender (down or ok).
- The master node ID of the sending node, if it is a replica.
Ping and pong packets also contain a gossip section. This section offers to the receiver a view of what the sender node thinks about other nodes in the cluster. The gossip section only contains information about a few random nodes among the set of nodes known to the sender. The number of nodes mentioned in a gossip section is proportional to the cluster size.
For every node added in the gossip section the following fields are reported:
- Node ID.
- IP and port of the node.
- Node flags.
Gossip sections allow receiving nodes to get information about the state of other nodes from the point of view of the sender. This is useful both for failure detection and to discover other nodes in the cluster.
Failure detection
Redis Cluster failure detection is used to recognize when a master or replica node is no longer reachable by the majority of nodes and then respond by promoting a replica to the role of master. When replica promotion is not possible the cluster is put in an error state to stop receiving queries from clients.
As already mentioned, every node takes a list of flags associated with other known nodes. There are two flags that are used for failure detection that are called PFAIL
and FAIL
. PFAIL
means Possible failure, and is a non-acknowledged failure type. FAIL
means that a node is failing and that this condition was confirmed by a majority of masters within a fixed amount of time.
PFAIL flag:
A node flags another node with the PFAIL
flag when the node is not reachable for more than NODE_TIMEOUT
time. Both master and replica nodes can flag another node as PFAIL
, regardless of its type.
The concept of non-reachability for a Redis Cluster node is that we have an active ping (a ping that we sent for which we have yet to get a reply) pending for longer than NODE_TIMEOUT
. For this mechanism to work the NODE_TIMEOUT
must be large compared to the network round trip time. In order to add reliability during normal operations, nodes will try to reconnect with other nodes in the cluster as soon as half of the NODE_TIMEOUT
has elapsed without a reply to a ping. This mechanism ensures that connections are kept alive so broken connections usually won’t result in false failure reports between nodes.
FAIL flag:
The PFAIL
flag alone is just local information every node has about other nodes, but it is not sufficient to trigger a replica promotion. For a node to be considered down the PFAIL
condition needs to be escalated to a FAIL
condition.
As outlined in the node heartbeats section of this document, every node sends gossip messages to every other node including the state of a few random known nodes. Every node eventually receives a set of node flags for every other node. This way every node has a mechanism to signal other nodes about failure conditions they have detected.
A PFAIL
condition is escalated to a FAIL
condition when the following set of conditions are met:
- Some node, that we’ll call A, has another node B flagged as
PFAIL
. - Node A collected, via gossip sections, information about the state of B from the point of view of the majority of masters in the cluster.
- The majority of masters signaled the
PFAIL
orFAIL
condition withinNODE_TIMEOUT * FAIL_REPORT_VALIDITY_MULT
time. (The validity factor is set to 2 in the current implementation, so this is just two times theNODE_TIMEOUT
time).
If all the above conditions are true, Node A will:
- Mark the node as
FAIL
. - Send a
FAIL
message (as opposed to aFAIL
condition within a heartbeat message) to all the reachable nodes.
The FAIL
message will force every receiving node to mark the node in FAIL
state, whether or not it already flagged the node in PFAIL
state.
Note that the FAIL flag is mostly one way. That is, a node can go from PFAIL
to FAIL
, but a FAIL
flag can only be cleared in the following situations:
- The node is already reachable and is a replica. In this case the
FAIL
flag can be cleared as replicas are not failed over. - The node is already reachable and is a master not serving any slot. In this case the
FAIL
flag can be cleared as masters without slots do not really participate in the cluster and are waiting to be configured in order to join the cluster. - The node is already reachable and is a master, but a long time (N times the
NODE_TIMEOUT
) has elapsed without any detectable replica promotion. It’s better for it to rejoin the cluster and continue in this case.
It is useful to note that while the PFAIL
-> FAIL
transition uses a form of agreement, the agreement used is weak:
- Nodes collect views of other nodes over some time period, so even if the majority of master nodes need to “agree”, actually this is just state that we collected from different nodes at different times and we are not sure, nor we require, that at a given moment the majority of masters agreed. However we discard failure reports which are old, so the failure was signaled by the majority of masters within a window of time.
- While every node detecting the
FAIL
condition will force that condition on other nodes in the cluster using theFAIL
message, there is no way to ensure the message will reach all the nodes. For instance a node may detect theFAIL
condition and because of a partition will not be able to reach any other node.
However the Redis Cluster failure detection has a liveness requirement: eventually all the nodes should agree about the state of a given node. There are two cases that can originate from split brain conditions. Either some minority of nodes believe the node is in FAIL
state, or a minority of nodes believe the node is not in FAIL
state. In both the cases eventually the cluster will have a single view of the state of a given node:
Case 1: If a majority of masters have flagged a node as FAIL
, because of failure detection and the chain effect it generates, every other node will eventually flag the master as FAIL
, since in the specified window of time enough failures will be reported.
Case 2: When only a minority of masters have flagged a node as FAIL
, the replica promotion will not happen (as it uses a more formal algorithm that makes sure everybody knows about the promotion eventually) and every node will clear the FAIL
state as per the FAIL
state clearing rules above (i.e. no promotion after N times the NODE_TIMEOUT
has elapsed).
The FAIL
flag is only used as a trigger to run the safe part of the algorithm for the replica promotion. In theory a replica may act independently and start a replica promotion when its master is not reachable, and wait for the masters to refuse to provide the acknowledgment if the master is actually reachable by the majority. However the added complexity of the PFAIL -> FAIL
state, the weak agreement, and the FAIL
message forcing the propagation of the state in the shortest amount of time in the reachable part of the cluster, have practical advantages. Because of these mechanisms, usually all the nodes will stop accepting writes at about the same time if the cluster is in an error state. This is a desirable feature from the point of view of applications using Redis Cluster. Also erroneous election attempts initiated by replicas that can’t reach its master due to local problems (the master is otherwise reachable by the majority of other master nodes) are avoided.
Configuration handling, propagation, and failovers
Cluster current epoch
Redis Cluster uses a concept similar to the Raft algorithm “term”. In Redis Cluster the term is called epoch instead, and it is used in order to give incremental versioning to events. When multiple nodes provide conflicting information, it becomes possible for another node to understand which state is the most up to date.
The currentEpoch
is a 64 bit unsigned number.
At node creation every Redis Cluster node, both replicas and master nodes, set the currentEpoch
to 0.
Every time a packet is received from another node, if the epoch of the sender (part of the cluster bus messages header) is greater than the local node epoch, the currentEpoch
is updated to the sender epoch.
Because of these semantics, eventually all the nodes will agree to the greatest currentEpoch
in the cluster.
This information is used when the state of the cluster is changed and a node seeks agreement in order to perform some action.
Currently this happens only during replica promotion, as described in the next section. Basically the epoch is a logical clock for the cluster and dictates that given information wins over one with a smaller epoch.
Configuration epoch
Every master always advertises its configEpoch
in ping and pong packets along with a bitmap advertising the set of slots it serves.
The configEpoch
is set to zero in masters when a new node is created.
A new configEpoch
is created during replica election. replicas trying to replace
failing masters increment their epoch and try to get authorization from
a majority of masters. When a replica is authorized, a new unique configEpoch
is created and the replica turns into a master using the new configEpoch
.
As explained in the next sections the configEpoch
helps to resolve conflicts when different nodes claim divergent configurations (a condition that may happen because of network partitions and node failures).
replica nodes also advertise the configEpoch
field in ping and pong packets, but in the case of replicas the field represents the configEpoch
of its master as of the last time they exchanged packets. This allows other instances to detect when a replica has an old configuration that needs to be updated (master nodes will not grant votes to replicas with an old configuration).
Every time the configEpoch
changes for some known node, it is permanently stored in the nodes.conf file by all the nodes that receive this information. The same also happens for the currentEpoch
value. These two variables are guaranteed to be saved and fsync-ed
to disk when updated before a node continues its operations.
The configEpoch
values generated using a simple algorithm during failovers
are guaranteed to be new, incremental, and unique.
Replica election and promotion
Replica election and promotion is handled by replica nodes, with the help of master nodes that vote for the replica to promote.
A replica election happens when a master is in FAIL
state from the point of view of at least one of its replicas that has the prerequisites in order to become a master.
In order for a replica to promote itself to master, it needs to start an election and win it. All the replicas for a given master can start an election if the master is in FAIL
state, however only one replica will win the election and promote itself to master.
A replica starts an election when the following conditions are met:
- The replica’s master is in
FAIL
state. - The master was serving a non-zero number of slots.
- The replica replication link was disconnected from the master for no longer than a given amount of time, in order to ensure the promoted replica’s data is reasonably fresh. This time is user configurable.
In order to be elected, the first step for a replica is to increment its currentEpoch
counter, and request votes from master instances.
Votes are requested by the replica by broadcasting a FAILOVER_AUTH_REQUEST
packet to every master node of the cluster. Then it waits for a maximum time of two times the NODE_TIMEOUT
for replies to arrive (but always for at least 2 seconds).
Once a master has voted for a given replica, replying positively with a FAILOVER_AUTH_ACK
, it can no longer vote for another replica of the same master for a period of NODE_TIMEOUT * 2
. In this period it will not be able to reply to other authorization requests for the same master. This is not needed to guarantee safety, but useful for preventing multiple replicas from getting elected (even if with a different configEpoch
) at around the same time, which is usually not wanted.
A replica discards any AUTH_ACK
replies with an epoch that is less than the currentEpoch
at the time the vote request was sent. This ensures it doesn’t count votes intended for a previous election.
Once the replica receives ACKs from the majority of masters, it wins the election.
Otherwise if the majority is not reached within the period of two times NODE_TIMEOUT
(but always at least 2 seconds), the election is aborted and a new one will be tried again after NODE_TIMEOUT * 4
(and always at least 4 seconds).
Replica rank
As soon as a master is in FAIL
state, a replica waits a short period of time before trying to get elected. That delay is computed as follows:
DELAY = 500 milliseconds + random delay between 0 and 500 milliseconds +
REPLICA_RANK * 1000 milliseconds.
The fixed delay ensures that we wait for the FAIL
state to propagate across the cluster, otherwise the replica may try to get elected while the masters are still unaware of the FAIL
state, refusing to grant their vote.
The random delay is used to desynchronize replicas so they’re unlikely to start an election at the same time.
The REPLICA_RANK
is the rank of this replica regarding the amount of replication data it has processed from the master.
Replicas exchange messages when the master is failing in order to establish a (best effort) rank:
the replica with the most updated replication offset is at rank 0, the second most updated at rank 1, and so forth.
In this way the most updated replicas try to get elected before others.
Rank order is not strictly enforced; if a replica of higher rank fails to be elected, the others will try shortly.
Once a replica wins the election, it obtains a new unique and incremental configEpoch
which is higher than that of any other existing master. It starts advertising itself as master in ping and pong packets, providing the set of served slots with a configEpoch
that will win over the past ones.
In order to speedup the reconfiguration of other nodes, a pong packet is broadcast to all the nodes of the cluster. Currently unreachable nodes will eventually be reconfigured when they receive a ping or pong packet from another node or will receive an UPDATE
packet from another node if the information it publishes via heartbeat packets are detected to be out of date.
The other nodes will detect that there is a new master serving the same slots served by the old master but with a greater configEpoch
, and will upgrade their configuration. Replicas of the old master (or the failed over master if it rejoins the cluster) will not just upgrade the configuration but will also reconfigure to replicate from the new master. How nodes rejoining the cluster are configured is explained in the next sections.
Masters reply to replica vote request
In the previous section, we discussed how replicas try to get elected. This section explains what happens from the point of view of a master that is requested to vote for a given replica.
Masters receive requests for votes in form of FAILOVER_AUTH_REQUEST
requests from replicas.
For a vote to be granted the following conditions need to be met:
- A master only votes a single time for a given epoch, and refuses to vote for older epochs: every master has a lastVoteEpoch field and will refuse to vote again as long as the
currentEpoch
in the auth request packet is not greater than the lastVoteEpoch. When a master replies positively to a vote request, the lastVoteEpoch is updated accordingly, and safely stored on disk. - A master votes for a replica only if the replica’s master is flagged as
FAIL
. - Auth requests with a
currentEpoch
that is less than the mastercurrentEpoch
are ignored. Because of this the master reply will always have the samecurrentEpoch
as the auth request. If the same replica asks again to be voted, incrementing thecurrentEpoch
, it is guaranteed that an old delayed reply from the master can not be accepted for the new vote.
Example of the issue caused by not using rule number 3:
Master currentEpoch
is 5, lastVoteEpoch is 1 (this may happen after a few failed elections)
- Replica
currentEpoch
is 3. - Replica tries to be elected with epoch 4 (3+1), master replies with an ok with
currentEpoch
5, however the reply is delayed. - Replica will try to be elected again, at a later time, with epoch 5 (4+1), the delayed reply reaches the replica with
currentEpoch
5, and is accepted as valid.
- Masters don’t vote for a replica of the same master before
NODE_TIMEOUT * 2
has elapsed if a replica of that master was already voted for. This is not strictly required as it is not possible for two replicas to win the election in the same epoch. However, in practical terms it ensures that when a replica is elected it has plenty of time to inform the other replicas and avoid the possibility that another replica will win a new election, performing an unnecessary second failover. - Masters make no effort to select the best replica in any way. If the replica’s master is in
FAIL
state and the master did not vote in the current term, a positive vote is granted. The best replica is the most likely to start an election and win it before the other replicas, since it will usually be able to start the voting process earlier because of its higher rank as explained in the previous section. - When a master refuses to vote for a given replica there is no negative response, the request is simply ignored.
- Masters don’t vote for replicas sending a
configEpoch
that is less than anyconfigEpoch
in the master table for the slots claimed by the replica. Remember that the replica sends theconfigEpoch
of its master, and the bitmap of the slots served by its master. This means that the replica requesting the vote must have a configuration for the slots it wants to failover that is newer or equal the one of the master granting the vote.
Practical example of configuration epoch usefulness during partitions
This section illustrates how the epoch concept is used to make the replica promotion process more resistant to partitions.
- A master is no longer reachable indefinitely. The master has three replicas A, B, C.
- Replica A wins the election and is promoted to master.
- A network partition makes A not available for the majority of the cluster.
- Replica B wins the election and is promoted as master.
- A partition makes B not available for the majority of the cluster.
- The previous partition is fixed, and A is available again.
At this point B is down and A is available again with a role of master (actually UPDATE
messages would reconfigure it promptly, but here we assume all UPDATE
messages were lost). At the same time, replica C will try to get elected in order to fail over B. This is what happens:
- C will try to get elected and will succeed, since for the majority of masters its master is actually down. It will obtain a new incremental
configEpoch
. - A will not be able to claim to be the master for its hash slots, because the other nodes already have the same hash slots associated with a higher configuration epoch (the one of B) compared to the one published by A.
- So, all the nodes will upgrade their table to assign the hash slots to C, and the cluster will continue its operations.
As you’ll see in the next sections, a stale node rejoining a cluster
will usually get notified as soon as possible about the configuration change
because as soon as it pings any other node, the receiver will detect it
has stale information and will send an UPDATE
message.
Hash slots configuration propagation
An important part of Redis Cluster is the mechanism used to propagate the information about which cluster node is serving a given set of hash slots. This is vital to both the startup of a fresh cluster and the ability to upgrade the configuration after a replica was promoted to serve the slots of its failing master.
The same mechanism allows nodes partitioned away for an indefinite amount of time to rejoin the cluster in a sensible way.
There are two ways hash slot configurations are propagated:
- Heartbeat messages. The sender of a ping or pong packet always adds information about the set of hash slots it (or its master, if it is a replica) serves.
UPDATE
messages. Since in every heartbeat packet there is information about the senderconfigEpoch
and set of hash slots served, if a receiver of a heartbeat packet finds the sender information is stale, it will send a packet with new information, forcing the stale node to update its info.
The receiver of a heartbeat or UPDATE
message uses certain simple rules in
order to update its table mapping hash slots to nodes. When a new Redis Cluster node is created, its local hash slot table is simply initialized to NULL
entries so that each hash slot is not bound or linked to any node. This looks similar to the following:
0 -> NULL
1 -> NULL
2 -> NULL
...
16383 -> NULL
The first rule followed by a node in order to update its hash slot table is the following:
Rule 1: If a hash slot is unassigned (set to NULL
), and a known node claims it, I’ll modify my hash slot table and associate the claimed hash slots to it.
So if we receive a heartbeat from node A claiming to serve hash slots 1 and 2 with a configuration epoch value of 3, the table will be modified to:
0 -> NULL
1 -> A [3]
2 -> A [3]
...
16383 -> NULL
When a new cluster is created, a system administrator needs to manually assign (using the CLUSTER ADDSLOTS
command, via the redis-cli command line tool, or by any other means) the slots served by each master node only to the node itself, and the information will rapidly propagate across the cluster.
However this rule is not enough. We know that hash slot mapping can change during two events:
- A replica replaces its master during a failover.
- A slot is resharded from a node to a different one.
For now let’s focus on failovers. When a replica fails over its master, it obtains a configuration epoch which is guaranteed to be greater than the one of its master (and more generally greater than any other configuration epoch generated previously). For example node B, which is a replica of A, may failover A with configuration epoch of 4. It will start to send heartbeat packets (the first time mass-broadcasting cluster-wide) and because of the following second rule, receivers will update their hash slot tables:
Rule 2: If a hash slot is already assigned, and a known node is advertising it using a configEpoch
that is greater than the configEpoch
of the master currently associated with the slot, I’ll rebind the hash slot to the new node.
So after receiving messages from B that claim to serve hash slots 1 and 2 with configuration epoch of 4, the receivers will update their table in the following way:
0 -> NULL
1 -> B [4]
2 -> B [4]
...
16383 -> NULL
Liveness property: because of the second rule, eventually all nodes in the cluster will agree that the owner of a slot is the one with the greatest configEpoch
among the nodes advertising it.
This mechanism in Redis Cluster is called last failover wins.
The same happens during resharding. When a node importing a hash slot completes the import operation, its configuration epoch is incremented to make sure the change will be propagated throughout the cluster.
UPDATE messages, a closer look
With the previous section in mind, it is easier to see how update messages
work. Node A may rejoin the cluster after some time. It will send heartbeat
packets where it claims it serves hash slots 1 and 2 with configuration epoch
of 3. All the receivers with updated information will instead see that
the same hash slots are associated with node B having a higher configuration
epoch. Because of this they’ll send an UPDATE
message to A with the new
configuration for the slots. A will update its configuration because of the
rule 2 above.
How nodes rejoin the cluster
The same basic mechanism is used when a node rejoins a cluster. Continuing with the example above, node A will be notified that hash slots 1 and 2 are now served by B. Assuming that these two were the only hash slots served by A, the count of hash slots served by A will drop to 0! So A will reconfigure to be a replica of the new master.
The actual rule followed is a bit more complex than this. In general it may happen that A rejoins after a lot of time, in the meantime it may happen that hash slots originally served by A are served by multiple nodes, for example hash slot 1 may be served by B, and hash slot 2 by C.
So the actual Redis Cluster node role switch rule is: A master node will change its configuration to replicate (be a replica of) the node that stole its last hash slot.
During reconfiguration, eventually the number of served hash slots will drop to zero, and the node will reconfigure accordingly. Note that in the base case this just means that the old master will be a replica of the replica that replaced it after a failover. However in the general form the rule covers all possible cases.
Replicas do exactly the same: they reconfigure to replicate the node that stole the last hash slot of its former master.
Replica migration
Redis Cluster implements a concept called replica migration in order to improve the availability of the system. The idea is that in a cluster with a master-replica setup, if the map between replicas and masters is fixed availability is limited over time if multiple independent failures of single nodes happen.
For example in a cluster where every master has a single replica, the cluster can continue operations as long as either the master or the replica fail, but not if both fail the same time. However there is a class of failures that are the independent failures of single nodes caused by hardware or software issues that can accumulate over time. For example:
- Master A has a single replica A1.
- Master A fails. A1 is promoted as new master.
- Three hours later A1 fails in an independent manner (unrelated to the failure of A). No other replica is available for promotion since node A is still down. The cluster cannot continue normal operations.
If the map between masters and replicas is fixed, the only way to make the cluster more resistant to the above scenario is to add replicas to every master, however this is costly as it requires more instances of Redis to be executed, more memory, and so forth.
An alternative is to create an asymmetry in the cluster, and let the cluster layout automatically change over time. For example the cluster may have three masters A, B, C. A and B have a single replica each, A1 and B1. However the master C is different and has two replicas: C1 and C2.
Replica migration is the process of automatic reconfiguration of a replica in order to migrate to a master that has no longer coverage (no working replicas). With replica migration the scenario mentioned above turns into the following:
- Master A fails. A1 is promoted.
- C2 migrates as replica of A1, that is otherwise not backed by any replica.
- Three hours later A1 fails as well.
- C2 is promoted as new master to replace A1.
- The cluster can continue the operations.
Replica migration algorithm
The migration algorithm does not use any form of agreement since the replica layout in a Redis Cluster is not part of the cluster configuration that needs to be consistent and/or versioned with config epochs. Instead it uses an algorithm to avoid mass-migration of replicas when a master is not backed. The algorithm guarantees that eventually (once the cluster configuration is stable) every master will be backed by at least one replica.
This is how the algorithm works. To start we need to define what is a
good replica in this context: a good replica is a replica not in FAIL
state
from the point of view of a given node.
The execution of the algorithm is triggered in every replica that detects that there is at least a single master without good replicas. However among all the replicas detecting this condition, only a subset should act. This subset is actually often a single replica unless different replicas have in a given moment a slightly different view of the failure state of other nodes.
The acting replica is the replica among the masters with the maximum number of attached replicas, that is not in FAIL state and has the smallest node ID.
So for example if there are 10 masters with 1 replica each, and 2 masters with 5 replicas each, the replica that will try to migrate is - among the 2 masters having 5 replicas - the one with the lowest node ID. Given that no agreement is used, it is possible that when the cluster configuration is not stable, a race condition occurs where multiple replicas believe themselves to be the non-failing replica with the lower node ID (it is unlikely for this to happen in practice). If this happens, the result is multiple replicas migrating to the same master, which is harmless. If the race happens in a way that will leave the ceding master without replicas, as soon as the cluster is stable again the algorithm will be re-executed again and will migrate a replica back to the original master.
Eventually every master will be backed by at least one replica. However, the normal behavior is that a single replica migrates from a master with multiple replicas to an orphaned master.
The algorithm is controlled by a user-configurable parameter called
cluster-migration-barrier
: the number of good replicas a master
must be left with before a replica can migrate away. For example, if this
parameter is set to 2, a replica can try to migrate only if its master remains
with two working replicas.
configEpoch conflicts resolution algorithm
When new configEpoch
values are created via replica promotion during
failovers, they are guaranteed to be unique.
However there are two distinct events where new configEpoch values are
created in an unsafe way, just incrementing the local currentEpoch
of
the local node and hoping there are no conflicts at the same time.
Both the events are system-administrator triggered:
CLUSTER FAILOVER
command withTAKEOVER
option is able to manually promote a replica node into a master without the majority of masters being available. This is useful, for example, in multi data center setups.- Migration of slots for cluster rebalancing also generates new configuration epochs inside the local node without agreement for performance reasons.
Specifically, during manual resharding, when a hash slot is migrated from a node A to a node B, the resharding program will force B to upgrade its configuration to an epoch which is the greatest found in the cluster, plus 1 (unless the node is already the one with the greatest configuration epoch), without requiring agreement from other nodes. Usually a real world resharding involves moving several hundred hash slots (especially in small clusters). Requiring an agreement to generate new configuration epochs during resharding, for each hash slot moved, is inefficient. Moreover it requires an fsync in each of the cluster nodes every time in order to store the new configuration. Because of the way it is performed instead, we only need a new config epoch when the first hash slot is moved, making it much more efficient in production environments.
However because of the two cases above, it is possible (though unlikely) to end
with multiple nodes having the same configuration epoch. A resharding operation
performed by the system administrator, and a failover happening at the same
time (plus a lot of bad luck) could cause currentEpoch
collisions if
they are not propagated fast enough.
Moreover, software bugs and filesystem corruptions can also contribute to multiple nodes having the same configuration epoch.
When masters serving different hash slots have the same configEpoch
, there
are no issues. It is more important that replicas failing over a master have
unique configuration epochs.
That said, manual interventions or resharding may change the cluster
configuration in different ways. The Redis Cluster main liveness property
requires that slot configurations always converge, so under every circumstance
we really want all the master nodes to have a different configEpoch
.
In order to enforce this, a conflict resolution algorithm is used in the
event that two nodes end up with the same configEpoch
.
- IF a master node detects another master node is advertising itself with
the same
configEpoch
. - AND IF the node has a lexicographically smaller Node ID compared to the other node claiming the same
configEpoch
. - THEN it increments its
currentEpoch
by 1, and uses it as the newconfigEpoch
.
If there are any set of nodes with the same configEpoch
, all the nodes but the one with the greatest Node ID will move forward, guaranteeing that, eventually, every node will pick a unique configEpoch regardless of what happened.
This mechanism also guarantees that after a fresh cluster is created, all
nodes start with a different configEpoch
(even if this is not actually
used) since redis-cli
makes sure to use CONFIG SET-CONFIG-EPOCH
at startup.
However if for some reason a node is left misconfigured, it will update
its configuration to a different configuration epoch automatically.
Node resets
Nodes can be software reset (without restarting them) in order to be reused in a different role or in a different cluster. This is useful in normal operations, in testing, and in cloud environments where a given node can be reprovisioned to join a different set of nodes to enlarge or create a new cluster.
In Redis Cluster nodes are reset using the CLUSTER RESET
command. The
command is provided in two variants:
CLUSTER RESET SOFT
CLUSTER RESET HARD
The command must be sent directly to the node to reset. If no reset type is provided, a soft reset is performed.
The following is a list of operations performed by a reset:
- Soft and hard reset: If the node is a replica, it is turned into a master, and its dataset is discarded. If the node is a master and contains keys the reset operation is aborted.
- Soft and hard reset: All the slots are released, and the manual failover state is reset.
- Soft and hard reset: All the other nodes in the nodes table are removed, so the node no longer knows any other node.
- Hard reset only:
currentEpoch
,configEpoch
, andlastVoteEpoch
are set to 0. - Hard reset only: the Node ID is changed to a new random ID.
Master nodes with non-empty data sets can’t be reset (since normally you want to reshard data to the other nodes). However, under special conditions when this is appropriate (e.g. when a cluster is totally destroyed with the intent of creating a new one), FLUSHALL
must be executed before proceeding with the reset.
Removing nodes from a cluster
It is possible to practically remove a node from an existing cluster by resharding all its data to other nodes (if it is a master node) and shutting it down. However, the other nodes will still remember its node ID and address, and will attempt to connect with it.
For this reason, when a node is removed we want to also remove its entry
from all the other nodes tables. This is accomplished by using the
CLUSTER FORGET <node-id>
command.
The command does two things:
- It removes the node with the specified node ID from the nodes table.
- It sets a 60 second ban which prevents a node with the same node ID from being re-added.
The second operation is needed because Redis Cluster uses gossip in order to auto-discover nodes, so removing the node X from node A, could result in node B gossiping about node X to A again. Because of the 60 second ban, the Redis Cluster administration tools have 60 seconds in order to remove the node from all the nodes, preventing the re-addition of the node due to auto discovery.
Further information is available in the CLUSTER FORGET
documentation.
Publish/Subscribe
In a Redis Cluster, clients can subscribe to every node, and can also publish to every other node. The cluster will make sure that published messages are forwarded as needed.
The clients can send SUBSCRIBE to any node and can also send PUBLISH to any node. It will simply broadcast each published message to all other nodes.
Redis 7.0 and later features sharded pub/sub, in which shard channels are assigned to slots by the same algorithm used to assign keys to slots. A shard message must be sent to a node that owns the slot the shard channel is hashed to. The cluster makes sure the published shard messages are forwarded to all nodes in the shard, so clients can subscribe to a shard channel by connecting to either the master responsible for the slot, or to any of its replicas.
Appendix
Appendix A: CRC16 reference implementation in ANSI C
/*
* Copyright 2001-2010 Georges Menie (www.menie.org)
* Copyright 2010 Salvatore Sanfilippo (adapted to Redis coding style)
* All rights reserved.
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the University of California, Berkeley nor the
* names of its contributors may be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE REGENTS AND CONTRIBUTORS BE LIABLE FOR ANY
* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/* CRC16 implementation according to CCITT standards.
*
* Note by @antirez: this is actually the XMODEM CRC 16 algorithm, using the
* following parameters:
*
* Name : "XMODEM", also known as "ZMODEM", "CRC-16/ACORN"
* Width : 16 bit
* Poly : 1021 (That is actually x^16 + x^12 + x^5 + 1)
* Initialization : 0000
* Reflect Input byte : False
* Reflect Output CRC : False
* Xor constant to output CRC : 0000
* Output for "123456789" : 31C3
*/
static const uint16_t crc16tab[256]= {
0x0000,0x1021,0x2042,0x3063,0x4084,0x50a5,0x60c6,0x70e7,
0x8108,0x9129,0xa14a,0xb16b,0xc18c,0xd1ad,0xe1ce,0xf1ef,
0x1231,0x0210,0x3273,0x2252,0x52b5,0x4294,0x72f7,0x62d6,
0x9339,0x8318,0xb37b,0xa35a,0xd3bd,0xc39c,0xf3ff,0xe3de,
0x2462,0x3443,0x0420,0x1401,0x64e6,0x74c7,0x44a4,0x5485,
0xa56a,0xb54b,0x8528,0x9509,0xe5ee,0xf5cf,0xc5ac,0xd58d,
0x3653,0x2672,0x1611,0x0630,0x76d7,0x66f6,0x5695,0x46b4,
0xb75b,0xa77a,0x9719,0x8738,0xf7df,0xe7fe,0xd79d,0xc7bc,
0x48c4,0x58e5,0x6886,0x78a7,0x0840,0x1861,0x2802,0x3823,
0xc9cc,0xd9ed,0xe98e,0xf9af,0x8948,0x9969,0xa90a,0xb92b,
0x5af5,0x4ad4,0x7ab7,0x6a96,0x1a71,0x0a50,0x3a33,0x2a12,
0xdbfd,0xcbdc,0xfbbf,0xeb9e,0x9b79,0x8b58,0xbb3b,0xab1a,
0x6ca6,0x7c87,0x4ce4,0x5cc5,0x2c22,0x3c03,0x0c60,0x1c41,
0xedae,0xfd8f,0xcdec,0xddcd,0xad2a,0xbd0b,0x8d68,0x9d49,
0x7e97,0x6eb6,0x5ed5,0x4ef4,0x3e13,0x2e32,0x1e51,0x0e70,
0xff9f,0xefbe,0xdfdd,0xcffc,0xbf1b,0xaf3a,0x9f59,0x8f78,
0x9188,0x81a9,0xb1ca,0xa1eb,0xd10c,0xc12d,0xf14e,0xe16f,
0x1080,0x00a1,0x30c2,0x20e3,0x5004,0x4025,0x7046,0x6067,
0x83b9,0x9398,0xa3fb,0xb3da,0xc33d,0xd31c,0xe37f,0xf35e,
0x02b1,0x1290,0x22f3,0x32d2,0x4235,0x5214,0x6277,0x7256,
0xb5ea,0xa5cb,0x95a8,0x8589,0xf56e,0xe54f,0xd52c,0xc50d,
0x34e2,0x24c3,0x14a0,0x0481,0x7466,0x6447,0x5424,0x4405,
0xa7db,0xb7fa,0x8799,0x97b8,0xe75f,0xf77e,0xc71d,0xd73c,
0x26d3,0x36f2,0x0691,0x16b0,0x6657,0x7676,0x4615,0x5634,
0xd94c,0xc96d,0xf90e,0xe92f,0x99c8,0x89e9,0xb98a,0xa9ab,
0x5844,0x4865,0x7806,0x6827,0x18c0,0x08e1,0x3882,0x28a3,
0xcb7d,0xdb5c,0xeb3f,0xfb1e,0x8bf9,0x9bd8,0xabbb,0xbb9a,
0x4a75,0x5a54,0x6a37,0x7a16,0x0af1,0x1ad0,0x2ab3,0x3a92,
0xfd2e,0xed0f,0xdd6c,0xcd4d,0xbdaa,0xad8b,0x9de8,0x8dc9,
0x7c26,0x6c07,0x5c64,0x4c45,0x3ca2,0x2c83,0x1ce0,0x0cc1,
0xef1f,0xff3e,0xcf5d,0xdf7c,0xaf9b,0xbfba,0x8fd9,0x9ff8,
0x6e17,0x7e36,0x4e55,0x5e74,0x2e93,0x3eb2,0x0ed1,0x1ef0
};
uint16_t crc16(const char *buf, int len) {
int counter;
uint16_t crc = 0;
for (counter = 0; counter < len; counter++)
crc = (crc<<8) ^ crc16tab[((crc>>8) ^ *buf++)&0x00FF];
return crc;
}
4 - Debugging
Redis is developed with an emphasis on stability. We do our best with every release to make sure you’ll experience a stable product with no crashes. However, if you ever need to debug the Redis process itself, read on.
When Redis crashes, it produces a detailed report of what happened. However, sometimes looking at the crash report is not enough, nor is it possible for the Redis core team to reproduce the issue independently. In this scenario, we need help from the user who can reproduce the issue.
This guide shows how to use GDB to provide the information the Redis developers will need to track the bug more easily.
What is GDB?
GDB is the Gnu Debugger: a program that is able to inspect the internal state of another program. Usually tracking and fixing a bug is an exercise in gathering more information about the state of the program at the moment the bug happens, so GDB is an extremely useful tool.
GDB can be used in two ways:
- It can attach to a running program and inspect the state of it at runtime.
- It can inspect the state of a program that already terminated using what is called a core file, that is, the image of the memory at the time the program was running.
From the point of view of investigating Redis bugs we need to use both of these
GDB modes. The user able to reproduce the bug attaches GDB to their running Redis
instance, and when the crash happens, they create the core
file that in turn
the developer will use to inspect the Redis internals at the time of the crash.
This way the developer can perform all the inspections in his or her computer without the help of the user, and the user is free to restart Redis in their production environment.
Compiling Redis without optimizations
By default Redis is compiled with the -O2
switch, this means that compiler
optimizations are enabled. This makes the Redis executable faster, but at the
same time it makes Redis (like any other program) harder to inspect using GDB.
It is better to attach GDB to Redis compiled without optimizations using the
make noopt
command (instead of just using the plain make
command). However,
if you have an already running Redis in production there is no need to recompile
and restart it if this is going to create problems on your side. GDB still works
against executables compiled with optimizations.
You should not be overly concerned at the loss of performance from compiling Redis without optimizations. It is unlikely that this will cause problems in your environment as Redis is not very CPU-bound.
Attaching GDB to a running process
If you have an already running Redis server, you can attach GDB to it, so that
if Redis crashes it will be possible to both inspect the internals and generate
a core dump
file.
After you attach GDB to the Redis process it will continue running as usual without any loss of performance, so this is not a dangerous procedure.
In order to attach GDB the first thing you need is the process ID of the running
Redis instance (the pid of the process). You can easily obtain it using
redis-cli
:
$ redis-cli info | grep process_id
process_id:58414
In the above example the process ID is 58414.
Login into your Redis server.
(Optional but recommended) Start screen or tmux or any other program that will make sure that your GDB session will not be closed if your ssh connection times out. You can learn more about screen in this article.
Attach GDB to the running Redis server by typing:
$ gdb <path-to-redis-executable> <pid>
For example:
$ gdb /usr/local/bin/redis-server 58414
GDB will start and will attach to the running server printing something like the following:
Reading symbols for shared libraries + done
0x00007fff8d4797e6 in epoll_wait ()
(gdb)
At this point GDB is attached but your Redis instance is blocked by GDB. In order to let the Redis instance continue the execution just type continue at the GDB prompt, and press enter.
(gdb) continue
Continuing.
Done! Now your Redis instance has GDB attached. Now you can wait for the next crash. :)
Now it’s time to detach your screen/tmux session, if you are running GDB using it, by pressing Ctrl-a a key combination.
After the crash
Redis has a command to simulate a segmentation fault (in other words a bad crash) using
the DEBUG SEGFAULT
command (don’t use it against a real production instance of course!
So I’ll use this command to crash my instance to show what happens in the GDB side:
(gdb) continue
Continuing.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xffffffffffffffff
debugCommand (c=0x7ffc32005000) at debug.c:220
220 *((char*)-1) = 'x';
As you can see GDB detected that Redis crashed, and was even able to show me the file name and line number causing the crash. This is already much better than the Redis crash report back trace (containing just function names and binary offsets).
Obtaining the stack trace
The first thing to do is to obtain a full stack trace with GDB. This is as simple as using the bt command:
(gdb) bt
#0 debugCommand (c=0x7ffc32005000) at debug.c:220
#1 0x000000010d246d63 in call (c=0x7ffc32005000) at redis.c:1163
#2 0x000000010d247290 in processCommand (c=0x7ffc32005000) at redis.c:1305
#3 0x000000010d251660 in processInputBuffer (c=0x7ffc32005000) at networking.c:959
#4 0x000000010d251872 in readQueryFromClient (el=0x0, fd=5, privdata=0x7fff76f1c0b0, mask=220924512) at networking.c:1021
#5 0x000000010d243523 in aeProcessEvents (eventLoop=0x7fff6ce408d0, flags=220829559) at ae.c:352
#6 0x000000010d24373b in aeMain (eventLoop=0x10d429ef0) at ae.c:397
#7 0x000000010d2494ff in main (argc=1, argv=0x10d2b2900) at redis.c:2046
This shows the backtrace, but we also want to dump the processor registers using the info registers command:
(gdb) info registers
rax 0x0 0
rbx 0x7ffc32005000 140721147367424
rcx 0x10d2b0a60 4515891808
rdx 0x7fff76f1c0b0 140735188943024
rsi 0x10d299777 4515796855
rdi 0x0 0
rbp 0x7fff6ce40730 0x7fff6ce40730
rsp 0x7fff6ce40650 0x7fff6ce40650
r8 0x4f26b3f7 1327936503
r9 0x7fff6ce40718 140735020271384
r10 0x81 129
r11 0x10d430398 4517462936
r12 0x4b7c04f8babc0 1327936503000000
r13 0x10d3350a0 4516434080
r14 0x10d42d9f0 4517452272
r15 0x10d430398 4517462936
rip 0x10d26cfd4 0x10d26cfd4 <debugCommand+68>
eflags 0x10246 66118
cs 0x2b 43
ss 0x0 0
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Please make sure to include both of these outputs in your bug report.
Obtaining the core file
The next step is to generate the core dump, that is the image of the memory of the running Redis process. This is done using the gcore
command:
(gdb) gcore
Saved corefile core.58414
Now you have the core dump to send to the Redis developer, but it is important to understand that this happens to contain all the data that was inside the Redis instance at the time of the crash; Redis developers will make sure not to share the content with anyone else, and will delete the file as soon as it is no longer used for debugging purposes, but you are warned that by sending the core file you are sending your data.
What to send to developers
Finally you can send everything to the Redis core team:
- The Redis executable you are using.
- The stack trace produced by the bt command, and the registers dump.
- The core file you generated with gdb.
- Information about the operating system and GCC version, and Redis version you are using.
Thank you
Your help is extremely important! Many issues can only be tracked this way. So thanks!
5 - Redis and the Gopher protocol
Redis contains an implementation of the Gopher protocol, as specified in the RFC 1436.
The Gopher protocol was very popular in the late ’90s. It is an alternative to the web, and the implementation both server and client side is so simple that the Redis server has just 100 lines of code in order to implement this support.
What do you do with Gopher nowadays? Well Gopher never really died, and lately there is a movement in order for the Gopher more hierarchical content composed of just plain text documents to be resurrected. Some want a simpler internet, others believe that the mainstream internet became too much controlled, and it’s cool to create an alternative space for people that want a bit of fresh air.
Anyway, for the 10th birthday of the Redis, we gave it the Gopher protocol as a gift.
How it works
The Redis Gopher support uses the inline protocol of Redis, and specifically two kind of inline requests that were anyway illegal: an empty request or any request that starts with “/” (there are no Redis commands starting with such a slash). Normal RESP2/RESP3 requests are completely out of the path of the Gopher protocol implementation and are served as usually as well.
If you open a connection to Redis when Gopher is enabled and send it a string like “/foo”, if there is a key named “/foo” it is served via the Gopher protocol.
In order to create a real Gopher “hole” (the name of a Gopher site in Gopher talking), you likely need a script such as the one in https://github.com/antirez/gopher2redis.
SECURITY WARNING
If you plan to put Redis on the internet in a publicly accessible address to server Gopher pages make sure to set a password to the instance. Once a password is set:
- The Gopher server (when enabled, not by default) will kill serve content via Gopher.
- However other commands cannot be called before the client will authenticate.
So use the requirepass
option to protect your instance.
To enable Gopher support use the following configuration line.
gopher-enabled yes
Accessing keys that are not strings or do not exit will generate an error in Gopher protocol format.
6 - Redis internals
The following Redis documents were written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010), and do not necessarily reflect the latest Redis implementation.
6.1 - Event library
Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010), and does not necessarily reflect the latest Redis implementation.
Why is an Event Library needed at all?
Let us figure it out through a series of Q&As.
Q: What do you expect a network server to be doing all the time?
A: Watch for inbound connections on the port its listening and accept them.
Q: Calling [accept](http://man.cx/accept%282%29 accept) yields a descriptor. What do I do with it?
A: Save the descriptor and do a non-blocking read/write operation on it.
Q: Why does the read/write have to be non-blocking?
A: If the file operation ( even a socket in Unix is a file ) is blocking how could the server for example accept other connection requests when its blocked in a file I/O operation.
Q: I guess I have to do many such non-blocking operations on the socket to see when it’s ready. Am I right?
A: Yes. That is what an event library does for you. Now you get it.
Q: How do Event Libraries do what they do?
A: They use the operating system’s polling facility along with timers.
Q: So are there any open source event libraries that do what you just described?
A: Yes. libevent
and libev
are two such event libraries that I can recall off the top of my head.
Q: Does Redis use such open source event libraries for handling socket I/O?
A: No. For various reasons Redis uses its own event library.
The Redis event library
Redis implements its own event library. The event library is implemented in ae.c
.
The best way to understand how the Redis event library works is to understand how Redis uses it.
Event Loop Initialization
initServer
function defined in redis.c
initializes the numerous fields of the redisServer
structure variable. One such field is the Redis event loop el
:
aeEventLoop *el
initServer
initializes server.el
field by calling aeCreateEventLoop
defined in ae.c
. The definition of aeEventLoop
is below:
typedef struct aeEventLoop
{
int maxfd;
long long timeEventNextId;
aeFileEvent events[AE_SETSIZE]; /* Registered events */
aeFiredEvent fired[AE_SETSIZE]; /* Fired events */
aeTimeEvent *timeEventHead;
int stop;
void *apidata; /* This is used for polling API specific data */
aeBeforeSleepProc *beforesleep;
} aeEventLoop;
aeCreateEventLoop
aeCreateEventLoop
first malloc
s aeEventLoop
structure then calls ae_epoll.c:aeApiCreate
.
aeApiCreate
malloc
s aeApiState
that has two fields - epfd
that holds the epoll
file descriptor returned by a call from epoll_create
and events
that is of type struct epoll_event
define by the Linux epoll
library. The use of the events
field will be described later.
Next is ae.c:aeCreateTimeEvent
. But before that initServer
call anet.c:anetTcpServer
that creates and returns a listening descriptor. The descriptor listens on port 6379 by default. The returned listening descriptor is stored in server.fd
field.
aeCreateTimeEvent
aeCreateTimeEvent
accepts the following as parameters:
eventLoop
: This isserver.el
inredis.c
- milliseconds: The number of milliseconds from the current time after which the timer expires.
proc
: Function pointer. Stores the address of the function that has to be called after the timer expires.clientData
: MostlyNULL
.finalizerProc
: Pointer to the function that has to be called before the timed event is removed from the list of timed events.
initServer
calls aeCreateTimeEvent
to add a timed event to timeEventHead
field of server.el
. timeEventHead
is a pointer to a list of such timed events. The call to aeCreateTimeEvent
from redis.c:initServer
function is given below:
aeCreateTimeEvent(server.el /*eventLoop*/, 1 /*milliseconds*/, serverCron /*proc*/, NULL /*clientData*/, NULL /*finalizerProc*/);
redis.c:serverCron
performs many operations that helps keep Redis running properly.
aeCreateFileEvent
The essence of aeCreateFileEvent
function is to execute epoll_ctl
system call which adds a watch for EPOLLIN
event on the listening descriptor create by anetTcpServer
and associate it with the epoll
descriptor created by a call to aeCreateEventLoop
.
Following is an explanation of what precisely aeCreateFileEvent
does when called from redis.c:initServer
.
initServer
passes the following arguments to aeCreateFileEvent
:
server.el
: The event loop created byaeCreateEventLoop
. Theepoll
descriptor is got fromserver.el
.server.fd
: The listening descriptor that also serves as an index to access the relevant file event structure from theeventLoop->events
table and store extra information like the callback function.AE_READABLE
: Signifies thatserver.fd
has to be watched forEPOLLIN
event.acceptHandler
: The function that has to be executed when the event being watched for is ready. This function pointer is stored ineventLoop->events[server.fd]->rfileProc
.
This completes the initialization of Redis event loop.
Event Loop Processing
ae.c:aeMain
called from redis.c:main
does the job of processing the event loop that is initialized in the previous phase.
ae.c:aeMain
calls ae.c:aeProcessEvents
in a while loop that processes pending time and file events.
aeProcessEvents
ae.c:aeProcessEvents
looks for the time event that will be pending in the smallest amount of time by calling ae.c:aeSearchNearestTimer
on the event loop. In our case there is only one timer event in the event loop that was created by ae.c:aeCreateTimeEvent
.
Remember, that the timer event created by aeCreateTimeEvent
has probably elapsed by now because it had an expiry time of one millisecond. Since the timer has already expired, the seconds and microseconds fields of the tvp
timeval
structure variable is initialized to zero.
The tvp
structure variable along with the event loop variable is passed to ae_epoll.c:aeApiPoll
.
aeApiPoll
functions does an epoll_wait
on the epoll
descriptor and populates the eventLoop->fired
table with the details:
fd
: The descriptor that is now ready to do a read/write operation depending on the mask value.mask
: The read/write event that can now be performed on the corresponding descriptor.
aeApiPoll
returns the number of such file events ready for operation. Now to put things in context, if any client has requested for a connection then aeApiPoll
would have noticed it and populated the eventLoop->fired
table with an entry of the descriptor being the listening descriptor and mask being AE_READABLE
.
Now, aeProcessEvents
calls the redis.c:acceptHandler
registered as the callback. acceptHandler
executes accept on the listening descriptor returning a connected descriptor with the client. redis.c:createClient
adds a file event on the connected descriptor through a call to ae.c:aeCreateFileEvent
like below:
if (aeCreateFileEvent(server.el, c->fd, AE_READABLE,
readQueryFromClient, c) == AE_ERR) {
freeClient(c);
return NULL;
}
c
is the redisClient
structure variable and c->fd
is the connected descriptor.
Next the ae.c:aeProcessEvent
calls ae.c:processTimeEvents
processTimeEvents
ae.processTimeEvents
iterates over list of time events starting at eventLoop->timeEventHead
.
For every timed event that has elapsed processTimeEvents
calls the registered callback. In this case it calls the only timed event callback registered, that is, redis.c:serverCron
. The callback returns the time in milliseconds after which the callback must be called again. This change is recorded via a call to ae.c:aeAddMilliSeconds
and will be handled on the next iteration of ae.c:aeMain
while loop.
That’s all.
6.2 - String internals
Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010). Virtual Memory has been deprecated since Redis 2.6, so this documentation is here only for historical interest.
The implementation of Redis strings is contained in sds.c
(sds
stands for
Simple Dynamic Strings). The implementation is available as a standalone library
at https://github.com/antirez/sds.
The C structure sdshdr
declared in sds.h
represents a Redis string:
struct sdshdr {
long len;
long free;
char buf[];
};
The buf
character array stores the actual string.
The len
field stores the length of buf
. This makes obtaining the length
of a Redis string an O(1) operation.
The free
field stores the number of additional bytes available for use.
Together the len
and free
field can be thought of as holding the metadata of the buf
character array.
Creating Redis Strings
A new data type named sds
is defined in sds.h
to be a synonym for a character pointer:
typedef char *sds;
sdsnewlen
function defined in sds.c
creates a new Redis String:
sds sdsnewlen(const void *init, size_t initlen) {
struct sdshdr *sh;
sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
#ifdef SDS_ABORT_ON_OOM
if (sh == NULL) sdsOomAbort();
#else
if (sh == NULL) return NULL;
#endif
sh->len = initlen;
sh->free = 0;
if (initlen) {
if (init) memcpy(sh->buf, init, initlen);
else memset(sh->buf,0,initlen);
}
sh->buf[initlen] = '\0';
return (char*)sh->buf;
}
Remember a Redis string is a variable of type struct sdshdr
. But sdsnewlen
returns a character pointer!!
That’s a trick and needs some explanation.
Suppose I create a Redis string using sdsnewlen
like below:
sdsnewlen("redis", 5);
This creates a new variable of type struct sdshdr
allocating memory for len
and free
fields as well as for the buf
character array.
sh = zmalloc(sizeof(struct sdshdr)+initlen+1); // initlen is length of init argument.
After sdsnewlen
successfully creates a Redis string the result is something like:
-----------
|5|0|redis|
-----------
^ ^
sh sh->buf
sdsnewlen
returns sh->buf
to the caller.
What do you do if you need to free the Redis string pointed by sh
?
You want the pointer sh
but you only have the pointer sh->buf
.
Can you get the pointer sh
from sh->buf
?
Yes. Pointer arithmetic. Notice from the above ASCII art that if you subtract
the size of two longs from sh->buf
you get the pointer sh
.
The sizeof
two longs happens to be the size of struct sdshdr
.
Look at sdslen
function and see this trick at work:
size_t sdslen(const sds s) {
struct sdshdr *sh = (void*) (s-(sizeof(struct sdshdr)));
return sh->len;
}
Knowing this trick you could easily go through the rest of the functions in sds.c
.
The Redis string implementation is hidden behind an interface that accepts only character pointers. The users of Redis strings need not care about how it’s implemented and can treat Redis strings as a character pointer.
6.3 - Virtual memory (deprecated)
Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010). Virtual Memory has been deprecated since Redis 2.6, so this documentation is here only for historical interest.
This document details the internals of the Redis Virtual Memory subsystem prior to Redis 2.6. The intended audience is not the final user but programmers willing to understand or modify the Virtual Memory implementation.
Keys vs Values: what is swapped out?
The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. This is a very generic command, but specifically, Redis transfers only objects associated with values. In order to understand better this concept we’ll show, using the DEBUG command, how a key holding a value looks from the point of view of the Redis internals:
redis> set foo bar
OK
redis> debug object foo
Key at:0x100101d00 refcount:1, value at:0x100101ce0 refcount:1 encoding:raw serializedlength:4
As you can see from the above output, the Redis top level hash table maps Redis Objects (keys) to other Redis Objects (values). The Virtual Memory is only able to swap values on disk, the objects associated to keys are always taken in memory: this trade off guarantees very good lookup performances, as one of the main design goals of the Redis VM is to have performances similar to Redis with VM disabled when the part of the dataset frequently used fits in RAM.
How does a swapped value looks like internally
When an object is swapped out, this is what happens in the hash table entry:
- The key continues to hold a Redis Object representing the key.
- The value is set to NULL
So you may wonder where we store the information that a given value (associated to a given key) was swapped out. Just in the key object!
This is how the Redis Object structure robj looks like:
/* The actual Redis Object */
typedef struct redisObject {
void *ptr;
unsigned char type;
unsigned char encoding;
unsigned char storage; /* If this object is a key, where is the value?
* REDIS_VM_MEMORY, REDIS_VM_SWAPPED, ... */
unsigned char vtype; /* If this object is a key, and value is swapped out,
* this is the type of the swapped out object. */
int refcount;
/* VM fields, this are only allocated if VM is active, otherwise the
* object allocation function will just allocate
* sizeof(redisObject) minus sizeof(redisObjectVM), so using
* Redis without VM active will not have any overhead. */
struct redisObjectVM vm;
} robj;
As you can see there are a few fields about VM. The most important one is storage, that can be one of this values:
REDIS_VM_MEMORY
: the associated value is in memory.REDIS_VM_SWAPPED
: the associated values is swapped, and the value entry of the hash table is just set to NULL.REDIS_VM_LOADING
: the value is swapped on disk, the entry is NULL, but there is a job to load the object from the swap to the memory (this field is only used when threaded VM is active).REDIS_VM_SWAPPING
: the value is in memory, the entry is a pointer to the actual Redis Object, but there is an I/O job in order to transfer this value to the swap file.
If an object is swapped on disk (REDIS_VM_SWAPPED
or REDIS_VM_LOADING
), how do we know where it is stored, what type it is, and so forth? That’s simple: the vtype field is set to the original type of the Redis object swapped, while the vm field (that is a redisObjectVM structure) holds information about the location of the object. This is the definition of this additional structure:
/* The VM object structure */
struct redisObjectVM {
off_t page; /* the page at which the object is stored on disk */
off_t usedpages; /* number of pages used on disk */
time_t atime; /* Last access time */
} vm;
As you can see the structure contains the page at which the object is located in the swap file, the number of pages used, and the last access time of the object (this is very useful for the algorithm that select what object is a good candidate for swapping, as we want to transfer on disk objects that are rarely accessed).
As you can see, while all the other fields are using unused bytes in the old Redis Object structure (we had some free bit due to natural memory alignment concerns), the vm field is new, and indeed uses additional memory. Should we pay such a memory cost even when VM is disabled? No! This is the code to create a new Redis Object:
... some code ...
if (server.vm_enabled) {
pthread_mutex_unlock(&server.obj_freelist_mutex);
o = zmalloc(sizeof(*o));
} else {
o = zmalloc(sizeof(*o)-sizeof(struct redisObjectVM));
}
... some code ...
As you can see if the VM system is not enabled we allocate just sizeof(*o)-sizeof(struct redisObjectVM)
of memory. Given that the vm field is the last in the object structure, and that this fields are never accessed if VM is disabled, we are safe and Redis without VM does not pay the memory overhead.
The Swap File
The next step in order to understand how the VM subsystem works is understanding how objects are stored inside the swap file. The good news is that’s not some kind of special format, we just use the same format used to store the objects in .rdb files, that are the usual dump files produced by Redis using the SAVE
command.
The swap file is composed of a given number of pages, where every page size is a given number of bytes. This parameters can be changed in redis.conf, since different Redis instances may work better with different values: it depends on the actual data you store inside it. The following are the default values:
vm-page-size 32
vm-pages 134217728
Redis takes a “bitmap” (an contiguous array of bits set to zero or one) in memory, every bit represent a page of the swap file on disk: if a given bit is set to 1, it represents a page that is already used (there is some Redis Object stored there), while if the corresponding bit is zero, the page is free.
Taking this bitmap (that will call the page table) in memory is a huge win in terms of performances, and the memory used is small: we just need 1 bit for every page on disk. For instance in the example below 134217728 pages of 32 bytes each (4GB swap file) is using just 16 MB of RAM for the page table.
Transferring objects from memory to swap
In order to transfer an object from memory to disk we need to perform the following steps (assuming non threaded VM, just a simple blocking approach):
- Find how many pages are needed in order to store this object on the swap file. This is trivially accomplished just calling the function
rdbSavedObjectPages
that returns the number of pages used by an object on disk. Note that this function does not duplicate the .rdb saving code just to understand what will be the length after an object will be saved on disk, we use the trick of opening /dev/null and writing the object there, finally callingftello
in order check the amount of bytes required. What we do basically is to save the object on a virtual very fast file, that is, /dev/null. - Now that we know how many pages are required in the swap file, we need to find this number of contiguous free pages inside the swap file. This task is accomplished by the
vmFindContiguousPages
function. As you can guess this function may fail if the swap is full, or so fragmented that we can’t easily find the required number of contiguous free pages. When this happens we just abort the swapping of the object, that will continue to live in memory. - Finally we can write the object on disk, at the specified position, just calling the function
vmWriteObjectOnSwap
.
As you can guess once the object was correctly written in the swap file, it is freed from memory, the storage field in the associated key is set to REDIS_VM_SWAPPED
, and the used pages are marked as used in the page table.
Loading objects back in memory
Loading an object from swap to memory is simpler, as we already know where the object is located and how many pages it is using. We also know the type of the object (the loading functions are required to know this information, as there is no header or any other information about the object type on disk), but this is stored in the vtype field of the associated key as already seen above.
Calling the function vmLoadObject
passing the key object associated to the value object we want to load back is enough. The function will also take care of fixing the storage type of the key (that will be REDIS_VM_MEMORY
), marking the pages as freed in the page table, and so forth.
The return value of the function is the loaded Redis Object itself, that we’ll have to set again as value in the main hash table (instead of the NULL value we put in place of the object pointer when the value was originally swapped out).
How blocking VM works
Now we have all the building blocks in order to describe how the blocking VM works. First of all, an important detail about configuration. In order to enable blocking VM in Redis server.vm_max_threads
must be set to zero.
We’ll see later how this max number of threads info is used in the threaded VM, for now all it’s needed to now is that Redis reverts to fully blocking VM when this is set to zero.
We also need to introduce another important VM parameter, that is, server.vm_max_memory
. This parameter is very important as it is used in order to trigger swapping: Redis will try to swap objects only if it is using more memory than the max memory setting, otherwise there is no need to swap as we are matching the user requested memory usage.
Blocking VM swapping
Swapping of object from memory to disk happens in the cron function. This function used to be called every second, while in the recent Redis versions on git it is called every 100 milliseconds (that is, 10 times per second).
If this function detects we are out of memory, that is, the memory used is greater than the vm-max-memory setting, it starts transferring objects from memory to disk in a loop calling the function vmSwapOneObect
. This function takes just one argument, if 0 it will swap objects in a blocking way, otherwise if it is 1, I/O threads are used. In the blocking scenario we just call it with zero as argument.
vmSwapOneObject acts performing the following steps:
- The key space in inspected in order to find a good candidate for swapping (we’ll see later what a good candidate for swapping is).
- The associated value is transferred to disk, in a blocking way.
- The key storage field is set to
REDIS_VM_SWAPPED
, while the vm fields of the object are set to the right values (the page index where the object was swapped, and the number of pages used to swap it). - Finally the value object is freed and the value entry of the hash table is set to NULL.
The function is called again and again until one of the following happens: there is no way to swap more objects because either the swap file is full or nearly all the objects are already transferred on disk, or simply the memory usage is already under the vm-max-memory parameter.
What values to swap when we are out of memory?
Understanding what’s a good candidate for swapping is not too hard. A few objects at random are sampled, and for each their swappability is commuted as:
swappability = age*log(size_in_memory)
The age is the number of seconds the key was not requested, while size_in_memory is a fast estimation of the amount of memory (in bytes) used by the object in memory. So we try to swap out objects that are rarely accessed, and we try to swap bigger objects over smaller one, but the latter is a less important factor (because of the logarithmic function used). This is because we don’t want bigger objects to be swapped out and in too often as the bigger the object the more I/O and CPU is required in order to transfer it.
Blocking VM loading
What happens if an operation against a key associated with a swapped out object is requested? For instance Redis may just happen to process the following command:
GET foo
If the value object of the foo
key is swapped we need to load it back in memory before processing the operation. In Redis the key lookup process is centralized in the lookupKeyRead
and lookupKeyWrite
functions, this two functions are used in the implementation of all the Redis commands accessing the keyspace, so we have a single point in the code where to handle the loading of the key from the swap file to memory.
So this is what happens:
- The user calls some command having as argument a swapped key
- The command implementation calls the lookup function
- The lookup function search for the key in the top level hash table. If the value associated with the requested key is swapped (we can see that checking the storage field of the key object), we load it back in memory in a blocking way before to return to the user.
This is pretty straightforward, but things will get more interesting with the threads. From the point of view of the blocking VM the only real problem is the saving of the dataset using another process, that is, handling BGSAVE
and BGREWRITEAOF
commands.
Background saving when VM is active
The default Redis way to persist on disk is to create .rdb files using a child process. Redis calls the fork() system call in order to create a child, that has the exact copy of the in memory dataset, since fork duplicates the whole program memory space (actually thanks to a technique called Copy on Write memory pages are shared between the parent and child process, so the fork() call will not require too much memory).
In the child process we have a copy of the dataset in a given point in the time. Other commands issued by clients will just be served by the parent process and will not modify the child data.
The child process will just store the whole dataset into the dump.rdb file and finally will exit. But what happens when the VM is active? Values can be swapped out so we don’t have all the data in memory, and we need to access the swap file in order to retrieve the swapped values. While child process is saving the swap file is shared between the parent and child process, since:
- The parent process needs to access the swap file in order to load values back into memory if an operation against swapped out values are performed.
- The child process needs to access the swap file in order to retrieve the full dataset while saving the data set on disk.
In order to avoid problems while both the processes are accessing the same swap file we do a simple thing, that is, not allowing values to be swapped out in the parent process while a background saving is in progress. This way both the processes will access the swap file in read only. This approach has the problem that while the child process is saving no new values can be transferred on the swap file even if Redis is using more memory than the max memory parameters dictates. This is usually not a problem as the background saving will terminate in a short amount of time and if still needed a percentage of values will be swapped on disk ASAP.
An alternative to this scenario is to enable the Append Only File that will have this problem only when a log rewrite is performed using the BGREWRITEAOF
command.
The problem with the blocking VM
The problem of blocking VM is that… it’s blocking :) This is not a problem when Redis is used in batch processing activities, but for real-time usage one of the good points of Redis is the low latency. The blocking VM will have bad latency behaviors as when a client is accessing a swapped out value, or when Redis needs to swap out values, no other clients will be served in the meantime.
Swapping out keys should happen in background. Similarly when a client is accessing a swapped out value other clients accessing in memory values should be served mostly as fast as when VM is disabled. Only the clients dealing with swapped out keys should be delayed.
All this limitations called for a non-blocking VM implementation.
Threaded VM
There are basically three main ways to turn the blocking VM into a non blocking one.
- 1: One way is obvious, and in my opinion, not a good idea at all, that is, turning Redis itself into a threaded server: if every request is served by a different thread automatically other clients don’t need to wait for blocked ones. Redis is fast, exports atomic operations, has no locks, and is just 10k lines of code, because it is single threaded, so this was not an option for me.
- 2: Using non-blocking I/O against the swap file. After all you can think Redis already event-loop based, why don’t just handle disk I/O in a non-blocking fashion? I also discarded this possibility because of two main reasons. One is that non blocking file operations, unlike sockets, are an incompatibility nightmare. It’s not just like calling select, you need to use OS-specific things. The other problem is that the I/O is just one part of the time consumed to handle VM, another big part is the CPU used in order to encode/decode data to/from the swap file. This is I picked option three, that is…
- 3: Using I/O threads, that is, a pool of threads handling the swap I/O operations. This is what the Redis VM is using, so let’s detail how this works.
I/O Threads
The threaded VM design goals where the following, in order of importance:
- Simple implementation, little room for race conditions, simple locking, VM system more or less completely decoupled from the rest of Redis code.
- Good performances, no locks for clients accessing values in memory.
- Ability to decode/encode objects in the I/O threads.
The above goals resulted in an implementation where the Redis main thread (the one serving actual clients) and the I/O threads communicate using a queue of jobs, with a single mutex.
Basically when main thread requires some work done in the background by some I/O thread, it pushes an I/O job structure in the server.io_newjobs
queue (that is, just a linked list). If there are no active I/O threads, one is started. At this point some I/O thread will process the I/O job, and the result of the processing is pushed in the server.io_processed
queue. The I/O thread will send a byte using an UNIX pipe to the main thread in order to signal that a new job was processed and the result is ready to be processed.
This is how the iojob
structure looks like:
typedef struct iojob {
int type; /* Request type, REDIS_IOJOB_* */
redisDb *db;/* Redis database */
robj *key; /* This I/O request is about swapping this key */
robj *val; /* the value to swap for REDIS_IOREQ_*_SWAP, otherwise this
* field is populated by the I/O thread for REDIS_IOREQ_LOAD. */
off_t page; /* Swap page where to read/write the object */
off_t pages; /* Swap pages needed to save object. PREPARE_SWAP return val */
int canceled; /* True if this command was canceled by blocking side of VM */
pthread_t thread; /* ID of the thread processing this entry */
} iojob;
There are just three type of jobs that an I/O thread can perform (the type is specified by the type
field of the structure):
REDIS_IOJOB_LOAD
: load the value associated to a given key from swap to memory. The object offset inside the swap file ispage
, the object type iskey->vtype
. The result of this operation will populate theval
field of the structure.REDIS_IOJOB_PREPARE_SWAP
: compute the number of pages needed in order to save the object pointed byval
into the swap. The result of this operation will populate thepages
field.REDIS_IOJOB_DO_SWAP
: Transfer the object pointed byval
to the swap file, at page offsetpage
.
The main thread delegates just the above three tasks. All the rest is handled by the I/O thread itself, for instance finding a suitable range of free pages in the swap file page table (that is a fast operation), deciding what object to swap, altering the storage field of a Redis object to reflect the current state of a value.
Non blocking VM as probabilistic enhancement of blocking VM
So now we have a way to request background jobs dealing with slow VM operations. How to add this to the mix of the rest of the work done by the main thread? While blocking VM was aware that an object was swapped out just when the object was looked up, this is too late for us: in C it is not trivial to start a background job in the middle of the command, leave the function, and re-enter in the same point the computation when the I/O thread finished what we requested (that is, no co-routines or continuations or alike).
Fortunately there was a much, much simpler way to do this. And we love simple things: basically consider the VM implementation a blocking one, but add an optimization (using non the no blocking VM operations we are able to perform) to make the blocking very unlikely.
This is what we do:
- Every time a client sends us a command, before the command is executed, we examine the argument vector of the command in search for swapped keys. After all we know for every command what arguments are keys, as the Redis command format is pretty simple.
- If we detect that at least a key in the requested command is swapped on disk, we block the client instead of really issuing the command. For every swapped value associated to a requested key, an I/O job is created, in order to bring the values back in memory. The main thread continues the execution of the event loop, without caring about the blocked client.
- In the meanwhile, I/O threads are loading values in memory. Every time an I/O thread finished loading a value, it sends a byte to the main thread using an UNIX pipe. The pipe file descriptor has a readable event associated in the main thread event loop, that is the function
vmThreadedIOCompletedJob
. If this function detects that all the values needed for a blocked client were loaded, the client is restarted and the original command called.
So you can think of this as a blocked VM that almost always happen to have the right keys in memory, since we pause clients that are going to issue commands about swapped out values until this values are loaded.
If the function checking what argument is a key fails in some way, there is no problem: the lookup function will see that a given key is associated to a swapped out value and will block loading it. So our non blocking VM reverts to a blocking one when it is not possible to anticipate what keys are touched.
For instance in the case of the SORT
command used together with the GET
or BY
options, it is not trivial to know beforehand what keys will be requested, so at least in the first implementation, SORT BY/GET
resorts to the blocking VM implementation.
Blocking clients on swapped keys
How to block clients? To suspend a client in an event-loop based server is pretty trivial. All we do is canceling its read handler. Sometimes we do something different (for instance for BLPOP) that is just marking the client as blocked, but not processing new data (just accumulating the new data into input buffers).
Aborting I/O jobs
There is something hard to solve about the interactions between our blocking and non blocking VM, that is, what happens if a blocking operation starts about a key that is also “interested” by a non blocking operation at the same time?
For instance while SORT BY is executed, a few keys are being loaded in a blocking manner by the sort command. At the same time, another client may request the same keys with a simple GET key command, that will trigger the creation of an I/O job to load the key in background.
The only simple way to deal with this problem is to be able to kill I/O jobs in the main thread, so that if a key that we want to load or swap in a blocking way is in the REDIS_VM_LOADING
or REDIS_VM_SWAPPING
state (that is, there is an I/O job about this key), we can just kill the I/O job about this key, and go ahead with the blocking operation we want to perform.
This is not as trivial as it is. In a given moment an I/O job can be in one of the following three queues:
- server.io_newjobs: the job was already queued but no thread is handling it.
- server.io_processing: the job is being processed by an I/O thread.
- server.io_processed: the job was already processed.
The function able to kill an I/O job is
vmCancelThreadedIOJob
, and this is what it does: - If the job is in the newjobs queue, that’s simple, removing the iojob structure from the queue is enough as no thread is still executing any operation.
- If the job is in the processing queue, a thread is messing with our job (and possibly with the associated object!). The only thing we can do is waiting for the item to move to the next queue in a blocking way. Fortunately this condition happens very rarely so it’s not a performance problem.
- If the job is in the processed queue, we just mark it as canceled marking setting the
canceled
field to 1 in the iojob structure. The function processing completed jobs will just ignored and free the job instead of really processing it.
Questions?
This document is in no way complete, the only way to get the whole picture is reading the source code, but it should be a good introduction in order to make the code review / understanding a lot simpler.
Something is not clear about this page? Please leave a comment and I’ll try to address the issue possibly integrating the answer in this document.
6.4 - Redis design draft #2 (historical)
Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2013), as part of a series of design drafts. This is preserved for historical interest.
Redis Design Draft 2 – RDB version 7 info fields
- Author: Salvatore Sanfilippo
antirez@gmail.com
- GitHub issue #1048
History of revisions
1.0, 10 April 2013 - Initial draft.
Overview
The Redis RDB format lacks a simple way to add info fields to an RDB file without causing a backward compatibility issue even if the added meta data is not required in order to load data from the RDB file.
For example thanks to the info fields specified in this document it will be possible to add to RDB information like file creation time, Redis version generating the file, and any other useful information, in a way that not every field is required for an RDB version 7 file to be correctly processed.
Also with minimal changes it will be possible to add RDB version 7 support to Redis 2.6 without actually supporting the additional fields but just skipping them when loading an RDB file.
RDB info fields may have semantic meaning if needed, so that the presence of the field may add information about the data set specified in the RDB file format, however when an info field is required to be correctly decoded in order to understand and load the data set content of the RDB file, the RDB file format must be increased so that previous versions of Redis will not attempt to load it.
However currently the info fields are designed to only hold additional information that are not useful to load the dataset, but can better specify how the RDB file was created.
Info fields representation
The RDB format 6 has the following layout:
- A 9 bytes magic “REDIS0006”
- key-value pairs
- An EOF opcode
- CRC64 checksum
The proposal for RDB format 7 is to add the optional fields immediately after the first 9 bytes magic, so that the new format will be:
- A 9 bytes magic “REDIS0007”
- Info field 1
- Info field 2
- …
- Info field N
- Info field end-of-fields
- key-value pairs
- An EOF opcode
- CRC64 checksum
Every single info field has the following structure:
- A 16 bit identifier
- A 64 bit data length
- A data section of the exact length as specified
Both the identifier and the data length are stored in little endian byte ordering.
The special identifier 0 means that there are no other info fields, and that the remaining of the RDB file contains the key-value pairs.
Handling of info fields
A program can simply skip every info field it does not understand, as long as the RDB version matches the one that it is capable to load.
Specification of info fields IDs and content.
Info field 0 – End of info fields
This just means there are no longer info fields to process.
Info field 1 – Creation date
This field represents the unix time at which the RDB file was created. The format of the unix time is a 64 bit little endian integer representing seconds since 1th January 1970.
Info field 2 – Redis version
This field represents a null-terminated string containing the Redis version that generated the file, as displayed in the Redis version INFO field.
7 - Redis modules API
The modules documentation is composed of the following pages:
- Introduction to Redis modules (this file). An overview about Redis Modules system and API. It’s a good idea to start your reading here.
- Implementing native data types covers the implementation of native data types into modules.
- Blocking operations shows how to write blocking commands that will not reply immediately, but will block the client, without blocking the Redis server, and will provide a reply whenever will be possible.
- Redis modules API reference is generated from module.c top comments of RedisModule functions. It is a good reference in order to understand how each function works.
Redis modules make it possible to extend Redis functionality using external modules, rapidly implementing new Redis commands with features similar to what can be done inside the core itself.
Redis modules are dynamic libraries that can be loaded into Redis at
startup, or using the MODULE LOAD
command. Redis exports a C API, in the
form of a single C header file called redismodule.h
. Modules are meant
to be written in C, however it will be possible to use C++ or other languages
that have C binding functionalities.
Modules are designed in order to be loaded into different versions of Redis, so a given module does not need to be designed, or recompiled, in order to run with a specific version of Redis. For this reason, the module will register to the Redis core using a specific API version. The current API version is “1”.
This document is about an alpha version of Redis modules. API, functionalities and other details may change in the future.
Loading modules
In order to test the module you are developing, you can load the module
using the following redis.conf
configuration directive:
loadmodule /path/to/mymodule.so
It is also possible to load a module at runtime using the following command:
MODULE LOAD /path/to/mymodule.so
In order to list all loaded modules, use:
MODULE LIST
Finally, you can unload (and later reload if you wish) a module using the following command:
MODULE UNLOAD mymodule
Note that mymodule
above is not the filename without the .so
suffix, but
instead, the name the module used to register itself into the Redis core.
The name can be obtained using MODULE LIST
. However it is good practice
that the filename of the dynamic library is the same as the name the module
uses to register itself into the Redis core.
The simplest module you can write
In order to show the different parts of a module, here we’ll show a very simple module that implements a command that outputs a random number.
#include "redismodule.h"
#include <stdlib.h>
int HelloworldRand_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {
RedisModule_ReplyWithLongLong(ctx,rand());
return REDISMODULE_OK;
}
int RedisModule_OnLoad(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {
if (RedisModule_Init(ctx,"helloworld",1,REDISMODULE_APIVER_1)
== REDISMODULE_ERR) return REDISMODULE_ERR;
if (RedisModule_CreateCommand(ctx,"helloworld.rand",
HelloworldRand_RedisCommand, "fast random",
0, 0, 0) == REDISMODULE_ERR)
return REDISMODULE_ERR;
return REDISMODULE_OK;
}
The example module has two functions. One implements a command called
HELLOWORLD.RAND. This function is specific of that module. However the
other function called RedisModule_OnLoad()
must be present in each
Redis module. It is the entry point for the module to be initialized,
register its commands, and potentially other private data structures
it uses.
Note that it is a good idea for modules to call commands with the
name of the module followed by a dot, and finally the command name,
like in the case of HELLOWORLD.RAND
. This way it is less likely to
have collisions.
Note that if different modules have colliding commands, they’ll not be
able to work in Redis at the same time, since the function
RedisModule_CreateCommand
will fail in one of the modules, so the module
loading will abort returning an error condition.
Module initialization
The above example shows the usage of the function RedisModule_Init()
.
It should be the first function called by the module OnLoad
function.
The following is the function prototype:
int RedisModule_Init(RedisModuleCtx *ctx, const char *modulename,
int module_version, int api_version);
The Init
function announces the Redis core that the module has a given
name, its version (that is reported by MODULE LIST
), and that is willing
to use a specific version of the API.
If the API version is wrong, the name is already taken, or there are other
similar errors, the function will return REDISMODULE_ERR
, and the module
OnLoad
function should return ASAP with an error.
Before the Init
function is called, no other API function can be called,
otherwise the module will segfault and the Redis instance will crash.
The second function called, RedisModule_CreateCommand
, is used in order
to register commands into the Redis core. The following is the prototype:
int RedisModule_CreateCommand(RedisModuleCtx *ctx, const char *name,
RedisModuleCmdFunc cmdfunc, const char *strflags,
int firstkey, int lastkey, int keystep);
As you can see, most Redis modules API calls all take as first argument
the context
of the module, so that they have a reference to the module
calling it, to the command and client executing a given command, and so forth.
To create a new command, the above function needs the context, the command’s name, a pointer to the function implementing the command, the command’s flags and the positions of key names in the command’s arguments.
The function that implements the command must have the following prototype:
int mycommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc);
The command function arguments are just the context, that will be passed to all the other API calls, the command argument vector, and total number of arguments, as passed by the user.
As you can see, the arguments are provided as pointers to a specific data
type, the RedisModuleString
. This is an opaque data type you have API
functions to access and use, direct access to its fields is never needed.
Zooming into the example command implementation, we can find another call:
int RedisModule_ReplyWithLongLong(RedisModuleCtx *ctx, long long integer);
This function returns an integer to the client that invoked the command,
exactly like other Redis commands do, like for example INCR
or SCARD
.
Module cleanup
In most cases, there is no need for special cleanup.
When a module is unloaded, Redis will automatically unregister commands and
unsubscribe from notifications.
However in the case where a module contains some persistent memory or
configuration, a module may include an optional RedisModule_OnUnload
function.
If a module provides this function, it will be invoked during the module unload
process.
The following is the function prototype:
int RedisModule_OnUnload(RedisModuleCtx *ctx);
The OnUnload
function may prevent module unloading by returning
REDISMODULE_ERR
.
Otherwise, REDISMODULE_OK
should be returned.
Setup and dependencies of a Redis module
Redis modules don’t depend on Redis or some other library, nor they
need to be compiled with a specific redismodule.h
file. In order
to create a new module, just copy a recent version of redismodule.h
in your source tree, link all the libraries you want, and create
a dynamic library having the RedisModule_OnLoad()
function symbol
exported.
The module will be able to load into different versions of Redis.
A module can be designed to support both newer and older Redis versions where certain API functions are not available in all versions. If an API function is not implemented in the currently running Redis version, the function pointer is set to NULL. This allows the module to check if a function exists before using it:
if (RedisModule_SetCommandInfo != NULL) {
RedisModule_SetCommandInfo(cmd, &info);
}
In recent versions of redismodule.h
, a convenience macro RMAPI_FUNC_SUPPORTED(funcname)
is defined.
Using the macro or just comparing with NULL is a matter of personal preference.
Passing configuration parameters to Redis modules
When the module is loaded with the MODULE LOAD
command, or using the
loadmodule
directive in the redis.conf
file, the user is able to pass
configuration parameters to the module by adding arguments after the module
file name:
loadmodule mymodule.so foo bar 1234
In the above example the strings foo
, bar
and 1234
will be passed
to the module OnLoad()
function in the argv
argument as an array
of RedisModuleString pointers. The number of arguments passed is into argc
.
The way you can access those strings will be explained in the rest of this
document. Normally the module will store the module configuration parameters
in some static
global variable that can be accessed module wide, so that
the configuration can change the behavior of different commands.
Working with RedisModuleString objects
The command argument vector argv
passed to module commands, and the
return value of other module APIs functions, are of type RedisModuleString
.
Usually you directly pass module strings to other API calls, however sometimes you may need to directly access the string object.
There are a few functions in order to work with string objects:
const char *RedisModule_StringPtrLen(RedisModuleString *string, size_t *len);
The above function accesses a string by returning its pointer and setting its
length in len
.
You should never write to a string object pointer, as you can see from the
const
pointer qualifier.
However, if you want, you can create new string objects using the following API:
RedisModuleString *RedisModule_CreateString(RedisModuleCtx *ctx, const char *ptr, size_t len);
The string returned by the above command must be freed using a corresponding
call to RedisModule_FreeString()
:
void RedisModule_FreeString(RedisModuleString *str);
However if you want to avoid having to free strings, the automatic memory management, covered later in this document, can be a good alternative, by doing it for you.
Note that the strings provided via the argument vector argv
never need
to be freed. You only need to free new strings you create, or new strings
returned by other APIs, where it is specified that the returned string must
be freed.
Creating strings from numbers or parsing strings as numbers
Creating a new string from an integer is a very common operation, so there is a function to do this:
RedisModuleString *mystr = RedisModule_CreateStringFromLongLong(ctx,10);
Similarly in order to parse a string as a number:
long long myval;
if (RedisModule_StringToLongLong(ctx,argv[1],&myval) == REDISMODULE_OK) {
/* Do something with 'myval' */
}
Accessing Redis keys from modules
Most Redis modules, in order to be useful, have to interact with the Redis data space (this is not always true, for example an ID generator may never touch Redis keys). Redis modules have two different APIs in order to access the Redis data space, one is a low level API that provides very fast access and a set of functions to manipulate Redis data structures. The other API is more high level, and allows to call Redis commands and fetch the result, similarly to how Lua scripts access Redis.
The high level API is also useful in order to access Redis functionalities that are not available as APIs.
In general modules developers should prefer the low level API, because commands implemented using the low level API run at a speed comparable to the speed of native Redis commands. However there are definitely use cases for the higher level API. For example often the bottleneck could be processing the data and not accessing it.
Also note that sometimes using the low level API is not harder compared to the higher level one.
Calling Redis commands
The high level API to access Redis is the sum of the RedisModule_Call()
function, together with the functions needed in order to access the
reply object returned by Call()
.
RedisModule_Call
uses a special calling convention, with a format specifier
that is used to specify what kind of objects you are passing as arguments
to the function.
Redis commands are invoked just using a command name and a list of arguments.
However when calling commands, the arguments may originate from different
kind of strings: null-terminated C strings, RedisModuleString objects as
received from the argv
parameter in the command implementation, binary
safe C buffers with a pointer and a length, and so forth.
For example if I want to call INCRBY
using a first argument (the key)
a string received in the argument vector argv
, which is an array
of RedisModuleString object pointers, and a C string representing the
number “10” as second argument (the increment), I’ll use the following
function call:
RedisModuleCallReply *reply;
reply = RedisModule_Call(ctx,"INCRBY","sc",argv[1],"10");
The first argument is the context, and the second is always a null terminated
C string with the command name. The third argument is the format specifier
where each character corresponds to the type of the arguments that will follow.
In the above case "sc"
means a RedisModuleString object, and a null
terminated C string. The other arguments are just the two arguments as
specified. In fact argv[1]
is a RedisModuleString and "10"
is a null
terminated C string.
This is the full list of format specifiers:
- c – Null terminated C string pointer.
- b – C buffer, two arguments needed: C string pointer and
size_t
length. - s – RedisModuleString as received in
argv
or by other Redis module APIs returning a RedisModuleString object. - l – Long long integer.
- v – Array of RedisModuleString objects.
- ! – This modifier just tells the function to replicate the command to replicas and AOF. It is ignored from the point of view of arguments parsing.
- A – This modifier, when
!
is given, tells to suppress AOF propagation: the command will be propagated only to replicas. - R – This modifier, when
!
is given, tells to suppress replicas propagation: the command will be propagated only to the AOF if enabled.
The function returns a RedisModuleCallReply
object on success, on
error NULL is returned.
NULL is returned when the command name is invalid, the format specifier uses
characters that are not recognized, or when the command is called with the
wrong number of arguments. In the above cases the errno
var is set to EINVAL
. NULL is also returned when, in an instance with Cluster enabled, the target
keys are about non local hash slots. In this case errno
is set to EPERM
.
Working with RedisModuleCallReply objects.
RedisModuleCall
returns reply objects that can be accessed using the
RedisModule_CallReply*
family of functions.
In order to obtain the type or reply (corresponding to one of the data types
supported by the Redis protocol), the function RedisModule_CallReplyType()
is used:
reply = RedisModule_Call(ctx,"INCRBY","sc",argv[1],"10");
if (RedisModule_CallReplyType(reply) == REDISMODULE_REPLY_INTEGER) {
long long myval = RedisModule_CallReplyInteger(reply);
/* Do something with myval. */
}
Valid reply types are:
REDISMODULE_REPLY_STRING
Bulk string or status replies.REDISMODULE_REPLY_ERROR
Errors.REDISMODULE_REPLY_INTEGER
Signed 64 bit integers.REDISMODULE_REPLY_ARRAY
Array of replies.REDISMODULE_REPLY_NULL
NULL reply.
Strings, errors and arrays have an associated length. For strings and errors the length corresponds to the length of the string. For arrays the length is the number of elements. To obtain the reply length the following function is used:
size_t reply_len = RedisModule_CallReplyLength(reply);
In order to obtain the value of an integer reply, the following function is used, as already shown in the example above:
long long reply_integer_val = RedisModule_CallReplyInteger(reply);
Called with a reply object of the wrong type, the above function always
returns LLONG_MIN
.
Sub elements of array replies are accessed this way:
RedisModuleCallReply *subreply;
subreply = RedisModule_CallReplyArrayElement(reply,idx);
The above function returns NULL if you try to access out of range elements.
Strings and errors (which are like strings but with a different type) can
be accessed using in the following way, making sure to never write to
the resulting pointer (that is returned as as const
pointer so that
misusing must be pretty explicit):
size_t len;
char *ptr = RedisModule_CallReplyStringPtr(reply,&len);
If the reply type is not a string or an error, NULL is returned.
RedisCallReply objects are not the same as module string objects (RedisModuleString types). However sometimes you may need to pass replies of type string or integer, to API functions expecting a module string.
When this is the case, you may want to evaluate if using the low level API could be a simpler way to implement your command, or you can use the following function in order to create a new string object from a call reply of type string, error or integer:
RedisModuleString *mystr = RedisModule_CreateStringFromCallReply(myreply);
If the reply is not of the right type, NULL is returned.
The returned string object should be released with RedisModule_FreeString()
as usually, or by enabling automatic memory management (see corresponding
section).
Releasing call reply objects
Reply objects must be freed using RedisModule_FreeCallReply
. For arrays,
you need to free only the top level reply, not the nested replies.
Currently the module implementation provides a protection in order to avoid
crashing if you free a nested reply object for error, however this feature
is not guaranteed to be here forever, so should not be considered part
of the API.
If you use automatic memory management (explained later in this document) you don’t need to free replies (but you still could if you wish to release memory ASAP).
Returning values from Redis commands
Like normal Redis commands, new commands implemented via modules must be able to return values to the caller. The API exports a set of functions for this goal, in order to return the usual types of the Redis protocol, and arrays of such types as elements. Also errors can be returned with any error string and code (the error code is the initial uppercase letters in the error message, like the “BUSY” string in the “BUSY the sever is busy” error message).
All the functions to send a reply to the client are called
RedisModule_ReplyWith<something>
.
To return an error, use:
RedisModule_ReplyWithError(RedisModuleCtx *ctx, const char *err);
There is a predefined error string for key of wrong type errors:
REDISMODULE_ERRORMSG_WRONGTYPE
Example usage:
RedisModule_ReplyWithError(ctx,"ERR invalid arguments");
We already saw how to reply with a long long in the examples above:
RedisModule_ReplyWithLongLong(ctx,12345);
To reply with a simple string, that can’t contain binary values or newlines, (so it’s suitable to send small words, like “OK”) we use:
RedisModule_ReplyWithSimpleString(ctx,"OK");
It’s possible to reply with “bulk strings” that are binary safe, using two different functions:
int RedisModule_ReplyWithStringBuffer(RedisModuleCtx *ctx, const char *buf, size_t len);
int RedisModule_ReplyWithString(RedisModuleCtx *ctx, RedisModuleString *str);
The first function gets a C pointer and length. The second a RedisModuleString object. Use one or the other depending on the source type you have at hand.
In order to reply with an array, you just need to use a function to emit the array length, followed by as many calls to the above functions as the number of elements of the array are:
RedisModule_ReplyWithArray(ctx,2);
RedisModule_ReplyWithStringBuffer(ctx,"age",3);
RedisModule_ReplyWithLongLong(ctx,22);
To return nested arrays is easy, your nested array element just uses another
call to RedisModule_ReplyWithArray()
followed by the calls to emit the
sub array elements.
Returning arrays with dynamic length
Sometimes it is not possible to know beforehand the number of items of
an array. As an example, think of a Redis module implementing a FACTOR
command that given a number outputs the prime factors. Instead of
factorializing the number, storing the prime factors into an array, and
later produce the command reply, a better solution is to start an array
reply where the length is not known, and set it later. This is accomplished
with a special argument to RedisModule_ReplyWithArray()
:
RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
The above call starts an array reply so we can use other ReplyWith
calls
in order to produce the array items. Finally in order to set the length,
use the following call:
RedisModule_ReplySetArrayLength(ctx, number_of_items);
In the case of the FACTOR command, this translates to some code similar to this:
RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
number_of_factors = 0;
while(still_factors) {
RedisModule_ReplyWithLongLong(ctx, some_factor);
number_of_factors++;
}
RedisModule_ReplySetArrayLength(ctx, number_of_factors);
Another common use case for this feature is iterating over the arrays of some collection and only returning the ones passing some kind of filtering.
It is possible to have multiple nested arrays with postponed reply.
Each call to SetArray()
will set the length of the latest corresponding
call to ReplyWithArray()
:
RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
... generate 100 elements ...
RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
... generate 10 elements ...
RedisModule_ReplySetArrayLength(ctx, 10);
RedisModule_ReplySetArrayLength(ctx, 100);
This creates a 100 items array having as last element a 10 items array.
Arity and type checks
Often commands need to check that the number of arguments and type of the key
is correct. In order to report a wrong arity, there is a specific function
called RedisModule_WrongArity()
. The usage is trivial:
if (argc != 2) return RedisModule_WrongArity(ctx);
Checking for the wrong type involves opening the key and checking the type:
RedisModuleKey *key = RedisModule_OpenKey(ctx,argv[1],
REDISMODULE_READ|REDISMODULE_WRITE);
int keytype = RedisModule_KeyType(key);
if (keytype != REDISMODULE_KEYTYPE_STRING &&
keytype != REDISMODULE_KEYTYPE_EMPTY)
{
RedisModule_CloseKey(key);
return RedisModule_ReplyWithError(ctx,REDISMODULE_ERRORMSG_WRONGTYPE);
}
Note that you often want to proceed with a command both if the key is of the expected type, or if it’s empty.
Low level access to keys
Low level access to keys allow to perform operations on value objects associated to keys directly, with a speed similar to what Redis uses internally to implement the built-in commands.
Once a key is opened, a key pointer is returned that will be used with all the other low level API calls in order to perform operations on the key or its associated value.
Because the API is meant to be very fast, it cannot do too many run-time checks, so the user must be aware of certain rules to follow:
- Opening the same key multiple times where at least one instance is opened for writing, is undefined and may lead to crashes.
- While a key is open, it should only be accessed via the low level key API. For example opening a key, then calling DEL on the same key using the
RedisModule_Call()
API will result into a crash. However it is safe to open a key, perform some operation with the low level API, closing it, then using other APIs to manage the same key, and later opening it again to do some more work.
In order to open a key the RedisModule_OpenKey
function is used. It returns
a key pointer, that we’ll use with all the next calls to access and modify
the value:
RedisModuleKey *key;
key = RedisModule_OpenKey(ctx,argv[1],REDISMODULE_READ);
The second argument is the key name, that must be a RedisModuleString
object.
The third argument is the mode: REDISMODULE_READ
or REDISMODULE_WRITE
.
It is possible to use |
to bitwise OR the two modes to open the key in
both modes. Currently a key opened for writing can also be accessed for reading
but this is to be considered an implementation detail. The right mode should
be used in sane modules.
You can open non existing keys for writing, since the keys will be created
when an attempt to write to the key is performed. However when opening keys
just for reading, RedisModule_OpenKey
will return NULL if the key does not
exist.
Once you are done using a key, you can close it with:
RedisModule_CloseKey(key);
Note that if automatic memory management is enabled, you are not forced to close keys. When the module function returns, Redis will take care to close all the keys which are still open.
Getting the key type
In order to obtain the value of a key, use the RedisModule_KeyType()
function:
int keytype = RedisModule_KeyType(key);
It returns one of the following values:
REDISMODULE_KEYTYPE_EMPTY
REDISMODULE_KEYTYPE_STRING
REDISMODULE_KEYTYPE_LIST
REDISMODULE_KEYTYPE_HASH
REDISMODULE_KEYTYPE_SET
REDISMODULE_KEYTYPE_ZSET
The above are just the usual Redis key types, with the addition of an empty type, that signals the key pointer is associated with an empty key that does not yet exists.
Creating new keys
To create a new key, open it for writing and then write to it using one of the key writing functions. Example:
RedisModuleKey *key;
key = RedisModule_OpenKey(ctx,argv[1],REDISMODULE_WRITE);
if (RedisModule_KeyType(key) == REDISMODULE_KEYTYPE_EMPTY) {
RedisModule_StringSet(key,argv[2]);
}
Deleting keys
Just use:
RedisModule_DeleteKey(key);
The function returns REDISMODULE_ERR
if the key is not open for writing.
Note that after a key gets deleted, it is setup in order to be targeted
by new key commands. For example RedisModule_KeyType()
will return it is
an empty key, and writing to it will create a new key, possibly of another
type (depending on the API used).
Managing key expires (TTLs)
To control key expires two functions are provided, that are able to set, modify, get, and unset the time to live associated with a key.
One function is used in order to query the current expire of an open key:
mstime_t RedisModule_GetExpire(RedisModuleKey *key);
The function returns the time to live of the key in milliseconds, or
REDISMODULE_NO_EXPIRE
as a special value to signal the key has no associated
expire or does not exist at all (you can differentiate the two cases checking
if the key type is REDISMODULE_KEYTYPE_EMPTY
).
In order to change the expire of a key the following function is used instead:
int RedisModule_SetExpire(RedisModuleKey *key, mstime_t expire);
When called on a non existing key, REDISMODULE_ERR
is returned, because
the function can only associate expires to existing open keys (non existing
open keys are only useful in order to create new values with data type
specific write operations).
Again the expire
time is specified in milliseconds. If the key has currently
no expire, a new expire is set. If the key already have an expire, it is
replaced with the new value.
If the key has an expire, and the special value REDISMODULE_NO_EXPIRE
is
used as a new expire, the expire is removed, similarly to the Redis
PERSIST
command. In case the key was already persistent, no operation is
performed.
Obtaining the length of values
There is a single function in order to retrieve the length of the value associated to an open key. The returned length is value-specific, and is the string length for strings, and the number of elements for the aggregated data types (how many elements there is in a list, set, sorted set, hash).
size_t len = RedisModule_ValueLength(key);
If the key does not exist, 0 is returned by the function:
String type API
Setting a new string value, like the Redis SET
command does, is performed
using:
int RedisModule_StringSet(RedisModuleKey *key, RedisModuleString *str);
The function works exactly like the Redis SET
command itself, that is, if
there is a prior value (of any type) it will be deleted.
Accessing existing string values is performed using DMA (direct memory access) for speed. The API will return a pointer and a length, so that’s possible to access and, if needed, modify the string directly.
size_t len, j;
char *myptr = RedisModule_StringDMA(key,&len,REDISMODULE_WRITE);
for (j = 0; j < len; j++) myptr[j] = 'A';
In the above example we write directly on the string. Note that if you want
to write, you must be sure to ask for WRITE
mode.
DMA pointers are only valid if no other operations are performed with the key before using the pointer, after the DMA call.
Sometimes when we want to manipulate strings directly, we need to change
their size as well. For this scope, the RedisModule_StringTruncate
function
is used. Example:
RedisModule_StringTruncate(mykey,1024);
The function truncates, or enlarges the string as needed, padding it with
zero bytes if the previous length is smaller than the new length we request.
If the string does not exist since key
is associated to an open empty key,
a string value is created and associated to the key.
Note that every time StringTruncate()
is called, we need to re-obtain
the DMA pointer again, since the old may be invalid.
List type API
It’s possible to push and pop values from list values:
int RedisModule_ListPush(RedisModuleKey *key, int where, RedisModuleString *ele);
RedisModuleString *RedisModule_ListPop(RedisModuleKey *key, int where);
In both the APIs the where
argument specifies if to push or pop from tail
or head, using the following macros:
REDISMODULE_LIST_HEAD
REDISMODULE_LIST_TAIL
Elements returned by RedisModule_ListPop()
are like strings created with
RedisModule_CreateString()
, they must be released with
RedisModule_FreeString()
or by enabling automatic memory management.
Set type API
Work in progress.
Sorted set type API
Documentation missing, please refer to the top comments inside module.c
for the following functions:
RedisModule_ZsetAdd
RedisModule_ZsetIncrby
RedisModule_ZsetScore
RedisModule_ZsetRem
And for the sorted set iterator:
RedisModule_ZsetRangeStop
RedisModule_ZsetFirstInScoreRange
RedisModule_ZsetLastInScoreRange
RedisModule_ZsetFirstInLexRange
RedisModule_ZsetLastInLexRange
RedisModule_ZsetRangeCurrentElement
RedisModule_ZsetRangeNext
RedisModule_ZsetRangePrev
RedisModule_ZsetRangeEndReached
Hash type API
Documentation missing, please refer to the top comments inside module.c
for the following functions:
RedisModule_HashSet
RedisModule_HashGet
Iterating aggregated values
Work in progress.
Replicating commands
If you want to use module commands exactly like normal Redis commands, in the context of replicated Redis instances, or using the AOF file for persistence, it is important for module commands to handle their replication in a consistent way.
When using the higher level APIs to invoke commands, replication happens
automatically if you use the “!” modifier in the format string of
RedisModule_Call()
as in the following example:
reply = RedisModule_Call(ctx,"INCRBY","!sc",argv[1],"10");
As you can see the format specifier is "!sc"
. The bang is not parsed as a
format specifier, but it internally flags the command as “must replicate”.
If you use the above programming style, there are no problems. However sometimes things are more complex than that, and you use the low level API. In this case, if there are no side effects in the command execution, and it consistently always performs the same work, what is possible to do is to replicate the command verbatim as the user executed it. To do that, you just need to call the following function:
RedisModule_ReplicateVerbatim(ctx);
When you use the above API, you should not use any other replication function since they are not guaranteed to mix well.
However this is not the only option. It’s also possible to exactly tell
Redis what commands to replicate as the effect of the command execution, using
an API similar to RedisModule_Call()
but that instead of calling the command
sends it to the AOF / replicas stream. Example:
RedisModule_Replicate(ctx,"INCRBY","cl","foo",my_increment);
It’s possible to call RedisModule_Replicate
multiple times, and each
will emit a command. All the sequence emitted is wrapped between a
MULTI/EXEC
transaction, so that the AOF and replication effects are the
same as executing a single command.
Note that Call()
replication and Replicate()
replication have a rule,
in case you want to mix both forms of replication (not necessarily a good
idea if there are simpler approaches). Commands replicated with Call()
are always the first emitted in the final MULTI/EXEC
block, while all
the commands emitted with Replicate()
will follow.
Automatic memory management
Normally when writing programs in the C language, programmers need to manage memory manually. This is why the Redis modules API has functions to release strings, close open keys, free replies, and so forth.
However given that commands are executed in a contained environment and with a set of strict APIs, Redis is able to provide automatic memory management to modules, at the cost of some performance (most of the time, a very low cost).
When automatic memory management is enabled:
- You don’t need to close open keys.
- You don’t need to free replies.
- You don’t need to free RedisModuleString objects.
However you can still do it, if you want. For example, automatic memory management may be active, but inside a loop allocating a lot of strings, you may still want to free strings no longer used.
In order to enable automatic memory management, just call the following function at the start of the command implementation:
RedisModule_AutoMemory(ctx);
Automatic memory management is usually the way to go, however experienced C programmers may not use it in order to gain some speed and memory usage benefit.
Allocating memory into modules
Normal C programs use malloc()
and free()
in order to allocate and
release memory dynamically. While in Redis modules the use of malloc is
not technically forbidden, it is a lot better to use the Redis Modules
specific functions, that are exact replacements for malloc
, free
,
realloc
and strdup
. These functions are:
void *RedisModule_Alloc(size_t bytes);
void* RedisModule_Realloc(void *ptr, size_t bytes);
void RedisModule_Free(void *ptr);
void RedisModule_Calloc(size_t nmemb, size_t size);
char *RedisModule_Strdup(const char *str);
They work exactly like their libc
equivalent calls, however they use
the same allocator Redis uses, and the memory allocated using these
functions is reported by the INFO
command in the memory section, is
accounted when enforcing the maxmemory
policy, and in general is
a first citizen of the Redis executable. On the contrary, the method
allocated inside modules with libc malloc()
is transparent to Redis.
Another reason to use the modules functions in order to allocate memory
is that, when creating native data types inside modules, the RDB loading
functions can return deserialized strings (from the RDB file) directly
as RedisModule_Alloc()
allocations, so they can be used directly to
populate data structures after loading, instead of having to copy them
to the data structure.
Pool allocator
Sometimes in commands implementations, it is required to perform many small allocations that will be not retained at the end of the command execution, but are just functional to execute the command itself.
This work can be more easily accomplished using the Redis pool allocator:
void *RedisModule_PoolAlloc(RedisModuleCtx *ctx, size_t bytes);
It works similarly to malloc()
, and returns memory aligned to the
next power of two of greater or equal to bytes
(for a maximum alignment
of 8 bytes). However it allocates memory in blocks, so it the overhead
of the allocations is small, and more important, the memory allocated
is automatically released when the command returns.
So in general short living allocations are a good candidates for the pool allocator.
Writing commands compatible with Redis Cluster
Documentation missing, please check the following functions inside module.c
:
RedisModule_IsKeysPositionRequest(ctx);
RedisModule_KeyAtPos(ctx,pos);
7.1 - Modules API reference
Sections
- Heap allocation raw functions
- Commands API
- Module information and time measurement
- Automatic memory management for modules
- String objects APIs
- Reply APIs
- Commands replication API
- DB and Key APIs – Generic API
- Key API for String type
- Key API for List type
- Key API for Sorted Set type
- Key API for Sorted Set iterator
- Key API for Hash type
- Key API for Stream type
- Calling Redis commands from modules
- Modules data types
- RDB loading and saving functions
- Key digest API (DEBUG DIGEST interface for modules types)
- AOF API for modules data types
- IO context handling
- Logging
- Blocking clients from modules
- Thread Safe Contexts
- Module Keyspace Notifications API
- Modules Cluster API
- Modules Timers API
- Modules EventLoop API
- Modules ACL API
- Modules Dictionary API
- Modules Info fields
- Modules utility APIs
- Modules API exporting / importing
- Module Command Filter API
- Scanning keyspace and hashes
- Module fork API
- Server hooks implementation
- Key eviction API
- Miscellaneous APIs
- Defrag API
- Function index
Heap allocation raw functions
Memory allocated with these functions are taken into account by Redis key eviction algorithms and are reported in Redis memory usage information.
RedisModule_Alloc
void *RedisModule_Alloc(size_t bytes);
Available since: 4.0.0
Use like malloc()
. Memory allocated with this function is reported in
Redis INFO memory, used for keys eviction according to maxmemory settings
and in general is taken into account as memory allocated by Redis.
You should avoid using malloc()
.
RedisModule_Calloc
void *RedisModule_Calloc(size_t nmemb, size_t size);
Available since: 4.0.0
Use like calloc()
. Memory allocated with this function is reported in
Redis INFO memory, used for keys eviction according to maxmemory settings
and in general is taken into account as memory allocated by Redis.
You should avoid using calloc()
directly.
RedisModule_Realloc
void* RedisModule_Realloc(void *ptr, size_t bytes);
Available since: 4.0.0
Use like realloc()
for memory obtained with RedisModule_Alloc()
.
RedisModule_Free
void RedisModule_Free(void *ptr);
Available since: 4.0.0
Use like free()
for memory obtained by RedisModule_Alloc()
and
RedisModule_Realloc()
. However you should never try to free with
RedisModule_Free()
memory allocated with malloc()
inside your module.
RedisModule_Strdup
char *RedisModule_Strdup(const char *str);
Available since: 4.0.0
Like strdup()
but returns memory allocated with RedisModule_Alloc()
.
RedisModule_PoolAlloc
void *RedisModule_PoolAlloc(RedisModuleCtx *ctx, size_t bytes);
Available since: 4.0.0
Return heap allocated memory that will be freed automatically when the module callback function returns. Mostly suitable for small allocations that are short living and must be released when the callback returns anyway. The returned memory is aligned to the architecture word size if at least word size bytes are requested, otherwise it is just aligned to the next power of two, so for example a 3 bytes request is 4 bytes aligned while a 2 bytes request is 2 bytes aligned.
There is no realloc style function since when this is needed to use the pool allocator is not a good idea.
The function returns NULL if bytes
is 0.
Commands API
These functions are used to implement custom Redis commands.
For examples, see https://redis.io/topics/modules-intro.
RedisModule_IsKeysPositionRequest
int RedisModule_IsKeysPositionRequest(RedisModuleCtx *ctx);
Available since: 4.0.0
Return non-zero if a module command, that was declared with the flag “getkeys-api”, is called in a special way to get the keys positions and not to get executed. Otherwise zero is returned.
RedisModule_KeyAtPosWithFlags
void RedisModule_KeyAtPosWithFlags(RedisModuleCtx *ctx, int pos, int flags);
When a module command is called in order to obtain the position of
keys, since it was flagged as “getkeys-api” during the registration,
the command implementation checks for this special call using the
RedisModule_IsKeysPositionRequest()
API and uses this function in
order to report keys.
The supported flags are the ones used by RedisModule_SetCommandInfo
, see REDISMODULE_CMD_KEY_
*.
The following is an example of how it could be used:
if (RedisModule_IsKeysPositionRequest(ctx)) {
RedisModule_KeyAtPosWithFlags(ctx, 2, REDISMODULE_CMD_KEY_RO | REDISMODULE_CMD_KEY_ACCESS);
RedisModule_KeyAtPosWithFlags(ctx, 1, REDISMODULE_CMD_KEY_RW | REDISMODULE_CMD_KEY_UPDATE | REDISMODULE_CMD_KEY_ACCESS);
}
Note: in the example above the get keys API could have been handled by key-specs (preferred). Implementing the getkeys-api is required only when is it not possible to declare key-specs that cover all keys.
RedisModule_KeyAtPos
void RedisModule_KeyAtPos(RedisModuleCtx *ctx, int pos);
Available since: 4.0.0
This API existed before RedisModule_KeyAtPosWithFlags
was added, now deprecated and
can be used for compatibility with older versions, before key-specs and flags
were introduced.
RedisModule_IsChannelsPositionRequest
int RedisModule_IsChannelsPositionRequest(RedisModuleCtx *ctx);
Return non-zero if a module command, that was declared with the flag “getchannels-api”, is called in a special way to get the channel positions and not to get executed. Otherwise zero is returned.
RedisModule_ChannelAtPosWithFlags
void RedisModule_ChannelAtPosWithFlags(RedisModuleCtx *ctx,
int pos,
int flags);
When a module command is called in order to obtain the position of
channels, since it was flagged as “getchannels-api” during the
registration, the command implementation checks for this special call
using the RedisModule_IsChannelsPositionRequest()
API and uses this
function in order to report the channels.
The supported flags are:
REDISMODULE_CMD_CHANNEL_SUBSCRIBE
: This command will subscribe to the channel.REDISMODULE_CMD_CHANNEL_UNSUBSCRIBE
: This command will unsubscribe from this channel.REDISMODULE_CMD_CHANNEL_PUBLISH
: This command will publish to this channel.REDISMODULE_CMD_CHANNEL_PATTERN
: Instead of acting on a specific channel, will act on any channel specified by the pattern. This is the same access used by the PSUBSCRIBE and PUNSUBSCRIBE commands available in Redis. Not intended to be used with PUBLISH permissions.
The following is an example of how it could be used:
if (RedisModule_IsChannelsPositionRequest(ctx)) {
RedisModule_ChannelAtPosWithFlags(ctx, 1, REDISMODULE_CMD_CHANNEL_SUBSCRIBE | REDISMODULE_CMD_CHANNEL_PATTERN);
RedisModule_ChannelAtPosWithFlags(ctx, 1, REDISMODULE_CMD_CHANNEL_PUBLISH);
}
Note: One usage of declaring channels is for evaluating ACL permissions. In this context,
unsubscribing is always allowed, so commands will only be checked against subscribe and
publish permissions. This is preferred over using RedisModule_ACLCheckChannelPermissions
, since
it allows the ACLs to be checked before the command is executed.
RedisModule_CreateCommand
int RedisModule_CreateCommand(RedisModuleCtx *ctx,
const char *name,
RedisModuleCmdFunc cmdfunc,
const char *strflags,
int firstkey,
int lastkey,
int keystep);
Available since: 4.0.0
Register a new command in the Redis server, that will be handled by
calling the function pointer ‘cmdfunc’ using the RedisModule calling
convention. The function returns REDISMODULE_ERR
if the specified command
name is already busy or a set of invalid flags were passed, otherwise
REDISMODULE_OK
is returned and the new command is registered.
This function must be called during the initialization of the module
inside the RedisModule_OnLoad()
function. Calling this function outside
of the initialization function is not defined.
The command function type is the following:
int MyCommand_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc);
And is supposed to always return REDISMODULE_OK
.
The set of flags ‘strflags’ specify the behavior of the command, and should be passed as a C string composed of space separated words, like for example “write deny-oom”. The set of flags are:
- “write”: The command may modify the data set (it may also read from it).
- “readonly”: The command returns data from keys but never writes.
- “admin”: The command is an administrative command (may change replication or perform similar tasks).
- “deny-oom”: The command may use additional memory and should be denied during out of memory conditions.
- “deny-script”: Don’t allow this command in Lua scripts.
- “allow-loading”: Allow this command while the server is loading data. Only commands not interacting with the data set should be allowed to run in this mode. If not sure don’t use this flag.
- “pubsub”: The command publishes things on Pub/Sub channels.
- “random”: The command may have different outputs even starting from the same input arguments and key values. Starting from Redis 7.0 this flag has been deprecated. Declaring a command as “random” can be done using command tips, see https://redis.io/topics/command-tips.
- “allow-stale”: The command is allowed to run on slaves that don’t serve stale data. Don’t use if you don’t know what this means.
- “no-monitor”: Don’t propagate the command on monitor. Use this if the command has sensible data among the arguments.
- “no-slowlog”: Don’t log this command in the slowlog. Use this if the command has sensible data among the arguments.
- “fast”: The command time complexity is not greater than O(log(N)) where N is the size of the collection or anything else representing the normal scalability issue with the command.
- “getkeys-api”: The command implements the interface to return the arguments that are keys. Used when start/stop/step is not enough because of the command syntax.
- “no-cluster”: The command should not register in Redis Cluster since is not designed to work with it because, for example, is unable to report the position of the keys, programmatically creates key names, or any other reason.
- “no-auth”: This command can be run by an un-authenticated client. Normally this is used by a command that is used to authenticate a client.
- “may-replicate”: This command may generate replication traffic, even though it’s not a write command.
- “no-mandatory-keys”: All the keys this command may take are optional
- “blocking”: The command has the potential to block the client.
- “allow-busy”: Permit the command while the server is blocked either by a script or by a slow module command, see RM_Yield.
- “getchannels-api”: The command implements the interface to return the arguments that are channels.
The last three parameters specify which arguments of the new command are Redis keys. See https://redis.io/commands/command for more information.
firstkey
: One-based index of the first argument that’s a key. Position 0 is always the command name itself. 0 for commands with no keys.lastkey
: One-based index of the last argument that’s a key. Negative numbers refer to counting backwards from the last argument (-1 means the last argument provided) 0 for commands with no keys.keystep
: Step between first and last key indexes. 0 for commands with no keys.
This information is used by ACL, Cluster and the COMMAND
command.
NOTE: The scheme described above serves a limited purpose and can
only be used to find keys that exist at constant indices.
For non-trivial key arguments, you may pass 0,0,0 and use
RedisModule_SetCommandInfo
to set key specs using a more advanced scheme.
RedisModule_GetCommand
RedisModuleCommand *RedisModule_GetCommand(RedisModuleCtx *ctx,
const char *name);
Get an opaque structure, representing a module command, by command name. This structure is used in some of the command-related APIs.
NULL is returned in case of the following errors:
- Command not found
- The command is not a module command
- The command doesn’t belong to the calling module
RedisModule_CreateSubcommand
int RedisModule_CreateSubcommand(RedisModuleCommand *parent,
const char *name,
RedisModuleCmdFunc cmdfunc,
const char *strflags,
int firstkey,
int lastkey,
int keystep);
Very similar to RedisModule_CreateCommand
except that it is used to create
a subcommand, associated with another, container, command.
Example: If a module has a configuration command, MODULE.CONFIG, then
GET and SET should be individual subcommands, while MODULE.CONFIG is
a command, but should not be registered with a valid funcptr
:
if (RedisModule_CreateCommand(ctx,"module.config",NULL,"",0,0,0) == REDISMODULE_ERR)
return REDISMODULE_ERR;
RedisModuleCommand *parent = RedisModule_GetCommand(ctx,,"module.config");
if (RedisModule_CreateSubcommand(parent,"set",cmd_config_set,"",0,0,0) == REDISMODULE_ERR)
return REDISMODULE_ERR;
if (RedisModule_CreateSubcommand(parent,"get",cmd_config_get,"",0,0,0) == REDISMODULE_ERR)
return REDISMODULE_ERR;
Returns REDISMODULE_OK
on success and REDISMODULE_ERR
in case of the following errors:
- Error while parsing
strflags
- Command is marked as
no-cluster
but cluster mode is enabled parent
is already a subcommand (we do not allow more than one level of command nesting)parent
is a command with an implementation (RedisModuleCmdFunc
) (A parent command should be a pure container of subcommands)parent
already has a subcommand calledname
RedisModule_SetCommandInfo
int RedisModule_SetCommandInfo(RedisModuleCommand *command,
const RedisModuleCommandInfo *info);
Set additional command information.
Affects the output of COMMAND
, COMMAND INFO
and COMMAND DOCS
, Cluster,
ACL and is used to filter commands with the wrong number of arguments before
the call reaches the module code.
This function can be called after creating a command using RedisModule_CreateCommand
and fetching the command pointer using RedisModule_GetCommand
. The information can
only be set once for each command and has the following structure:
typedef struct RedisModuleCommandInfo {
const RedisModuleCommandInfoVersion *version;
const char *summary;
const char *complexity;
const char *since;
RedisModuleCommandHistoryEntry *history;
const char *tips;
int arity;
RedisModuleCommandKeySpec *key_specs;
RedisModuleCommandArg *args;
} RedisModuleCommandInfo;
All fields except version
are optional. Explanation of the fields:
-
version
: This field enables compatibility with different Redis versions. Always set this field toREDISMODULE_COMMAND_INFO_VERSION
. -
summary
: A short description of the command (optional). -
complexity
: Complexity description (optional). -
since
: The version where the command was introduced (optional). Note: The version specified should be the module’s, not Redis version. -
history
: An array ofRedisModuleCommandHistoryEntry
(optional), which is a struct with the following fields:const char *since; const char *changes;
since
is a version string andchanges
is a string describing the changes. The array is terminated by a zeroed entry, i.e. an entry with both strings set to NULL. -
tips
: A string of space-separated tips regarding this command, meant for clients and proxies. See https://redis.io/topics/command-tips. -
arity
: Number of arguments, including the command name itself. A positive number specifies an exact number of arguments and a negative number specifies a minimum number of arguments, so use -N to say >= N. Redis validates a call before passing it to a module, so this can replace an arity check inside the module command implementation. A value of 0 (or an omitted arity field) is equivalent to -2 if the command has sub commands and -1 otherwise. -
key_specs
: An array ofRedisModuleCommandKeySpec
, terminated by an element memset to zero. This is a scheme that tries to describe the positions of key arguments better than the oldRedisModule_CreateCommand
argumentsfirstkey
,lastkey
,keystep
and is needed if those three are not enough to describe the key positions. There are two steps to retrieve key positions: begin search (BS) in which index should find the first key and find keys (FK) which, relative to the output of BS, describes how can we will which arguments are keys. Additionally, there are key specific flags.Key-specs cause the triplet (firstkey, lastkey, keystep) given in RM_CreateCommand to be recomputed, but it is still useful to provide these three parameters in RM_CreateCommand, to better support old Redis versions where RM_SetCommandInfo is not available.
Note that key-specs don’t fully replace the “getkeys-api” (see RM_CreateCommand, RM_IsKeysPositionRequest and RM_KeyAtPosWithFlags) so it may be a good idea to supply both key-specs and implement the getkeys-api.
A key-spec has the following structure:
typedef struct RedisModuleCommandKeySpec { const char *notes; uint64_t flags; RedisModuleKeySpecBeginSearchType begin_search_type; union { struct { int pos; } index; struct { const char *keyword; int startfrom; } keyword; } bs; RedisModuleKeySpecFindKeysType find_keys_type; union { struct { int lastkey; int keystep; int limit; } range; struct { int keynumidx; int firstkey; int keystep; } keynum; } fk; } RedisModuleCommandKeySpec;
Explanation of the fields of RedisModuleCommandKeySpec:
-
notes
: Optional notes or clarifications about this key spec. -
flags
: A bitwise or of key-spec flags described below. -
begin_search_type
: This describes how the first key is discovered. There are two ways to determine the first key:REDISMODULE_KSPEC_BS_UNKNOWN
: There is no way to tell where the key args start.REDISMODULE_KSPEC_BS_INDEX
: Key args start at a constant index.REDISMODULE_KSPEC_BS_KEYWORD
: Key args start just after a specific keyword.
-
bs
: This is a union in which theindex
orkeyword
branch is used depending on the value of thebegin_search_type
field.-
bs.index.pos
: The index from which we start the search for keys. (REDISMODULE_KSPEC_BS_INDEX
only.) -
bs.keyword.keyword
: The keyword (string) that indicates the beginning of key arguments. (REDISMODULE_KSPEC_BS_KEYWORD
only.) -
bs.keyword.startfrom
: An index in argv from which to start searching. Can be negative, which means start search from the end, in reverse. Example: -2 means to start in reverse from the penultimate argument. (REDISMODULE_KSPEC_BS_KEYWORD
only.)
-
-
find_keys_type
: After the “begin search”, this describes which arguments are keys. The strategies are:REDISMODULE_KSPEC_BS_UNKNOWN
: There is no way to tell where the key args are located.REDISMODULE_KSPEC_FK_RANGE
: Keys end at a specific index (or relative to the last argument).REDISMODULE_KSPEC_FK_KEYNUM
: There’s an argument that contains the number of key args somewhere before the keys themselves.
find_keys_type
andfk
can be omitted if this keyspec describes exactly one key. -
fk
: This is a union in which therange
orkeynum
branch is used depending on the value of thefind_keys_type
field.-
fk.range
(forREDISMODULE_KSPEC_FK_RANGE
): A struct with the following fields:-
lastkey
: Index of the last key relative to the result of the begin search step. Can be negative, in which case it’s not relative. -1 indicates the last argument, -2 one before the last and so on. -
keystep
: How many arguments should we skip after finding a key, in order to find the next one? -
limit
: Iflastkey
is -1, we uselimit
to stop the search by a factor. 0 and 1 mean no limit. 2 means 1/2 of the remaining args, 3 means 1/3, and so on.
-
-
fk.keynum
(forREDISMODULE_KSPEC_FK_KEYNUM
): A struct with the following fields:-
keynumidx
: Index of the argument containing the number of keys to come, relative to the result of the begin search step. -
firstkey
: Index of the fist key relative to the result of the begin search step. (Usually it’s just afterkeynumidx
, in which case it should be set tokeynumidx + 1
.) -
keystep
: How many argumentss should we skip after finding a key, in order to find the next one?
-
-
Key-spec flags:
The first four refer to what the command actually does with the value or metadata of the key, and not necessarily the user data or how it affects it. Each key-spec may must have exactly one of these. Any operation that’s not distinctly deletion, overwrite or read-only would be marked as RW.
-
REDISMODULE_CMD_KEY_RO
: Read-Only. Reads the value of the key, but doesn’t necessarily return it. -
REDISMODULE_CMD_KEY_RW
: Read-Write. Modifies the data stored in the value of the key or its metadata. -
REDISMODULE_CMD_KEY_OW
: Overwrite. Overwrites the data stored in the value of the key. -
REDISMODULE_CMD_KEY_RM
: Deletes the key.
The next four refer to user data inside the value of the key, not the metadata like LRU, type, cardinality. It refers to the logical operation on the user’s data (actual input strings or TTL), being used/returned/copied/changed. It doesn’t refer to modification or returning of metadata (like type, count, presence of data). ACCESS can be combined with one of the write operations INSERT, DELETE or UPDATE. Any write that’s not an INSERT or a DELETE would be UPDATE.
-
REDISMODULE_CMD_KEY_ACCESS
: Returns, copies or uses the user data from the value of the key. -
REDISMODULE_CMD_KEY_UPDATE
: Updates data to the value, new value may depend on the old value. -
REDISMODULE_CMD_KEY_INSERT
: Adds data to the value with no chance of modification or deletion of existing data. -
REDISMODULE_CMD_KEY_DELETE
: Explicitly deletes some content from the value of the key.
Other flags:
-
REDISMODULE_CMD_KEY_NOT_KEY
: The key is not actually a key, but should be routed in cluster mode as if it was a key. -
REDISMODULE_CMD_KEY_INCOMPLETE
: The keyspec might not point out all the keys it should cover. -
REDISMODULE_CMD_KEY_VARIABLE_FLAGS
: Some keys might have different flags depending on arguments.
-
-
args
: An array ofRedisModuleCommandArg
, terminated by an element memset to zero.RedisModuleCommandArg
is a structure with at the fields described below.typedef struct RedisModuleCommandArg { const char *name; RedisModuleCommandArgType type; int key_spec_index; const char *token; const char *summary; const char *since; int flags; struct RedisModuleCommandArg *subargs; } RedisModuleCommandArg;
Explanation of the fields:
-
name
: Name of the argument. -
type
: The type of the argument. See below for details. The typesREDISMODULE_ARG_TYPE_ONEOF
andREDISMODULE_ARG_TYPE_BLOCK
require an argument to have sub-arguments, i.e.subargs
. -
key_spec_index
: If thetype
isREDISMODULE_ARG_TYPE_KEY
you must provide the index of the key-spec associated with this argument. Seekey_specs
above. If the argument is not a key, you may specify -1. -
token
: The token preceding the argument (optional). Example: the argumentseconds
inSET
has a tokenEX
. If the argument consists of only a token (for exampleNX
inSET
) the type should beREDISMODULE_ARG_TYPE_PURE_TOKEN
andvalue
should be NULL. -
summary
: A short description of the argument (optional). -
since
: The first version which included this argument (optional). -
flags
: A bitwise or of the macrosREDISMODULE_CMD_ARG_*
. See below. -
value
: The display-value of the argument. This string is what should be displayed when creating the command syntax from the output ofCOMMAND
. Iftoken
is not NULL, it should also be displayed.
Explanation of
RedisModuleCommandArgType
:REDISMODULE_ARG_TYPE_STRING
: String argument.REDISMODULE_ARG_TYPE_INTEGER
: Integer argument.REDISMODULE_ARG_TYPE_DOUBLE
: Double-precision float argument.REDISMODULE_ARG_TYPE_KEY
: String argument representing a keyname.REDISMODULE_ARG_TYPE_PATTERN
: String, but regex pattern.REDISMODULE_ARG_TYPE_UNIX_TIME
: Integer, but Unix timestamp.REDISMODULE_ARG_TYPE_PURE_TOKEN
: Argument doesn’t have a placeholder. It’s just a token without a value. Example: theKEEPTTL
option of theSET
command.REDISMODULE_ARG_TYPE_ONEOF
: Used when the user can choose only one of a few sub-arguments. Requiressubargs
. Example: theNX
andXX
options ofSET
.REDISMODULE_ARG_TYPE_BLOCK
: Used when one wants to group together several sub-arguments, usually to apply something on all of them, like making the entire group “optional”. Requiressubargs
. Example: theLIMIT offset count
parameters inZRANGE
.
Explanation of the command argument flags:
REDISMODULE_CMD_ARG_OPTIONAL
: The argument is optional (like GET in the SET command).REDISMODULE_CMD_ARG_MULTIPLE
: The argument may repeat itself (like key in DEL).REDISMODULE_CMD_ARG_MULTIPLE_TOKEN
: The argument may repeat itself, and so does its token (likeGET pattern
in SORT).
-
On success REDISMODULE_OK
is returned. On error REDISMODULE_ERR
is returned
and errno
is set to EINVAL if invalid info was provided or EEXIST if info
has already been set. If the info is invalid, a warning is logged explaining
which part of the info is invalid and why.
Module information and time measurement
RedisModule_IsModuleNameBusy
int RedisModule_IsModuleNameBusy(const char *name);
Available since: 4.0.3
Return non-zero if the module name is busy. Otherwise zero is returned.
RedisModule_Milliseconds
long long RedisModule_Milliseconds(void);
Available since: 4.0.0
Return the current UNIX time in milliseconds.
RedisModule_MonotonicMicroseconds
uint64_t RedisModule_MonotonicMicroseconds(void);
Return counter of micro-seconds relative to an arbitrary point in time.
RedisModule_BlockedClientMeasureTimeStart
int RedisModule_BlockedClientMeasureTimeStart(RedisModuleBlockedClient *bc);
Available since: 6.2.0
Mark a point in time that will be used as the start time to calculate
the elapsed execution time when RedisModule_BlockedClientMeasureTimeEnd()
is called.
Within the same command, you can call multiple times
RedisModule_BlockedClientMeasureTimeStart()
and RedisModule_BlockedClientMeasureTimeEnd()
to accumulate independent time intervals to the background duration.
This method always return REDISMODULE_OK
.
RedisModule_BlockedClientMeasureTimeEnd
int RedisModule_BlockedClientMeasureTimeEnd(RedisModuleBlockedClient *bc);
Available since: 6.2.0
Mark a point in time that will be used as the end time
to calculate the elapsed execution time.
On success REDISMODULE_OK
is returned.
This method only returns REDISMODULE_ERR
if no start time was
previously defined ( meaning RedisModule_BlockedClientMeasureTimeStart
was not called ).
RedisModule_Yield
void RedisModule_Yield(RedisModuleCtx *ctx, int flags, const char *busy_reply);
This API allows modules to let Redis process background tasks, and some commands during long blocking execution of a module command. The module can call this API periodically. The flags is a bit mask of these:
REDISMODULE_YIELD_FLAG_NONE
: No special flags, can perform some background operations, but not process client commands.REDISMODULE_YIELD_FLAG_CLIENTS
: Redis can also process client commands.
The busy_reply
argument is optional, and can be used to control the verbose
error string after the -BUSY
error code.
When the REDISMODULE_YIELD_FLAG_CLIENTS
is used, Redis will only start
processing client commands after the time defined by the
busy-reply-threshold
config, in which case Redis will start rejecting most
commands with -BUSY
error, but allow the ones marked with the allow-busy
flag to be executed.
This API can also be used in thread safe context (while locked), and during
loading (in the rdb_load
callback, in which case it’ll reject commands with
the -LOADING error)
RedisModule_SetModuleOptions
void RedisModule_SetModuleOptions(RedisModuleCtx *ctx, int options);
Available since: 6.0.0
Set flags defining capabilities or behavior bit flags.
REDISMODULE_OPTIONS_HANDLE_IO_ERRORS
:
Generally, modules don’t need to bother with this, as the process will just
terminate if a read error happens, however, setting this flag would allow
repl-diskless-load to work if enabled.
The module should use RedisModule_IsIOError
after reads, before using the
data that was read, and in case of error, propagate it upwards, and also be
able to release the partially populated value and all it’s allocations.
REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED
:
See RedisModule_SignalModifiedKey()
.
REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD
:
Setting this flag indicates module awareness of diskless async replication (repl-diskless-load=swapdb)
and that redis could be serving reads during replication instead of blocking with LOADING status.
RedisModule_SignalModifiedKey
int RedisModule_SignalModifiedKey(RedisModuleCtx *ctx,
RedisModuleString *keyname);
Available since: 6.0.0
Signals that the key is modified from user’s perspective (i.e. invalidate WATCH and client side caching).
This is done automatically when a key opened for writing is closed, unless
the option REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED
has been set using
RedisModule_SetModuleOptions()
.
Automatic memory management for modules
RedisModule_AutoMemory
void RedisModule_AutoMemory(RedisModuleCtx *ctx);
Available since: 4.0.0
Enable automatic memory management.
The function must be called as the first function of a command implementation that wants to use automatic memory.
When enabled, automatic memory management tracks and automatically frees keys, call replies and Redis string objects once the command returns. In most cases this eliminates the need of calling the following functions:
These functions can still be used with automatic memory management enabled, to optimize loops that make numerous allocations for example.
String objects APIs
RedisModule_CreateString
RedisModuleString *RedisModule_CreateString(RedisModuleCtx *ctx,
const char *ptr,
size_t len);
Available since: 4.0.0
Create a new module string object. The returned string must be freed
with RedisModule_FreeString()
, unless automatic memory is enabled.
The string is created by copying the len
bytes starting
at ptr
. No reference is retained to the passed buffer.
The module context ‘ctx’ is optional and may be NULL if you want to create a string out of the context scope. However in that case, the automatic memory management will not be available, and the string memory must be managed manually.
RedisModule_CreateStringPrintf
RedisModuleString *RedisModule_CreateStringPrintf(RedisModuleCtx *ctx,
const char *fmt,
...);
Available since: 4.0.0
Create a new module string object from a printf format and arguments.
The returned string must be freed with RedisModule_FreeString()
, unless
automatic memory is enabled.
The string is created using the sds formatter function sdscatvprintf()
.
The passed context ‘ctx’ may be NULL if necessary, see the
RedisModule_CreateString()
documentation for more info.
RedisModule_CreateStringFromLongLong
RedisModuleString *RedisModule_CreateStringFromLongLong(RedisModuleCtx *ctx,
long long ll);
Available since: 4.0.0
Like RedisModule_CreatString()
, but creates a string starting from a long long
integer instead of taking a buffer and its length.
The returned string must be released with RedisModule_FreeString()
or by
enabling automatic memory management.
The passed context ‘ctx’ may be NULL if necessary, see the
RedisModule_CreateString()
documentation for more info.
RedisModule_CreateStringFromDouble
RedisModuleString *RedisModule_CreateStringFromDouble(RedisModuleCtx *ctx,
double d);
Available since: 6.0.0
Like RedisModule_CreatString()
, but creates a string starting from a double
instead of taking a buffer and its length.
The returned string must be released with RedisModule_FreeString()
or by
enabling automatic memory management.
RedisModule_CreateStringFromLongDouble
RedisModuleString *RedisModule_CreateStringFromLongDouble(RedisModuleCtx *ctx,
long double ld,
int humanfriendly);
Available since: 6.0.0
Like RedisModule_CreatString()
, but creates a string starting from a long
double.
The returned string must be released with RedisModule_FreeString()
or by
enabling automatic memory management.
The passed context ‘ctx’ may be NULL if necessary, see the
RedisModule_CreateString()
documentation for more info.
RedisModule_CreateStringFromString
RedisModuleString *RedisModule_CreateStringFromString(RedisModuleCtx *ctx,
const RedisModuleString *str);
Available since: 4.0.0
Like RedisModule_CreatString()
, but creates a string starting from another
RedisModuleString
.
The returned string must be released with RedisModule_FreeString()
or by
enabling automatic memory management.
The passed context ‘ctx’ may be NULL if necessary, see the
RedisModule_CreateString()
documentation for more info.
RedisModule_CreateStringFromStreamID
RedisModuleString *RedisModule_CreateStringFromStreamID(RedisModuleCtx *ctx,
const RedisModuleStreamID *id);
Available since: 6.2.0
Creates a string from a stream ID. The returned string must be released with
RedisModule_FreeString()
, unless automatic memory is enabled.
The passed context ctx
may be NULL if necessary. See the
RedisModule_CreateString()
documentation for more info.
RedisModule_FreeString
void RedisModule_FreeString(RedisModuleCtx *ctx, RedisModuleString *str);
Available since: 4.0.0
Free a module string object obtained with one of the Redis modules API calls that return new string objects.
It is possible to call this function even when automatic memory management is enabled. In that case the string will be released ASAP and removed from the pool of string to release at the end.
If the string was created with a NULL context ‘ctx’, it is also possible to pass ctx as NULL when releasing the string (but passing a context will not create any issue). Strings created with a context should be freed also passing the context, so if you want to free a string out of context later, make sure to create it using a NULL context.
RedisModule_RetainString
void RedisModule_RetainString(RedisModuleCtx *ctx, RedisModuleString *str);
Available since: 4.0.0
Every call to this function, will make the string ‘str’ requiring
an additional call to RedisModule_FreeString()
in order to really
free the string. Note that the automatic freeing of the string obtained
enabling modules automatic memory management counts for one
RedisModule_FreeString()
call (it is just executed automatically).
Normally you want to call this function when, at the same time the following conditions are true:
- You have automatic memory management enabled.
- You want to create string objects.
- Those string objects you create need to live after the callback function(for example a command implementation) creating them returns.
Usually you want this in order to store the created string object into your own data structure, for example when implementing a new data type.
Note that when memory management is turned off, you don’t need any call to RetainString() since creating a string will always result into a string that lives after the callback function returns, if no FreeString() call is performed.
It is possible to call this function with a NULL context.
When strings are going to be retained for an extended duration, it is good
practice to also call RedisModule_TrimStringAllocation()
in order to
optimize memory usage.
Threaded modules that reference retained strings from other threads must explicitly trim the allocation as soon as the string is retained. Not doing so may result with automatic trimming which is not thread safe.
RedisModule_HoldString
RedisModuleString* RedisModule_HoldString(RedisModuleCtx *ctx,
RedisModuleString *str);
Available since: 6.0.7
This function can be used instead of RedisModule_RetainString()
.
The main difference between the two is that this function will always
succeed, whereas RedisModule_RetainString()
may fail because of an
assertion.
The function returns a pointer to RedisModuleString
, which is owned
by the caller. It requires a call to RedisModule_FreeString()
to free
the string when automatic memory management is disabled for the context.
When automatic memory management is enabled, you can either call
RedisModule_FreeString()
or let the automation free it.
This function is more efficient than RedisModule_CreateStringFromString()
because whenever possible, it avoids copying the underlying
RedisModuleString
. The disadvantage of using this function is that it
might not be possible to use RedisModule_StringAppendBuffer()
on the
returned RedisModuleString
.
It is possible to call this function with a NULL context.
When strings are going to be held for an extended duration, it is good
practice to also call RedisModule_TrimStringAllocation()
in order to
optimize memory usage.
Threaded modules that reference held strings from other threads must explicitly trim the allocation as soon as the string is held. Not doing so may result with automatic trimming which is not thread safe.
RedisModule_StringPtrLen
const char *RedisModule_StringPtrLen(const RedisModuleString *str,
size_t *len);
Available since: 4.0.0
Given a string module object, this function returns the string pointer and length of the string. The returned pointer and length should only be used for read only accesses and never modified.
RedisModule_StringToLongLong
int RedisModule_StringToLongLong(const RedisModuleString *str, long long *ll);
Available since: 4.0.0
Convert the string into a long long integer, storing it at *ll
.
Returns REDISMODULE_OK
on success. If the string can’t be parsed
as a valid, strict long long (no spaces before/after), REDISMODULE_ERR
is returned.
RedisModule_StringToDouble
int RedisModule_StringToDouble(const RedisModuleString *str, double *d);
Available since: 4.0.0
Convert the string into a double, storing it at *d
.
Returns REDISMODULE_OK
on success or REDISMODULE_ERR
if the string is
not a valid string representation of a double value.
RedisModule_StringToLongDouble
int RedisModule_StringToLongDouble(const RedisModuleString *str,
long double *ld);
Available since: 6.0.0
Convert the string into a long double, storing it at *ld
.
Returns REDISMODULE_OK
on success or REDISMODULE_ERR
if the string is
not a valid string representation of a double value.
RedisModule_StringToStreamID
int RedisModule_StringToStreamID(const RedisModuleString *str,
RedisModuleStreamID *id);
Available since: 6.2.0
Convert the string into a stream ID, storing it at *id
.
Returns REDISMODULE_OK
on success and returns REDISMODULE_ERR
if the string
is not a valid string representation of a stream ID. The special IDs “+” and
“-” are allowed.
RedisModule_StringCompare
int RedisModule_StringCompare(RedisModuleString *a, RedisModuleString *b);
Available since: 4.0.0
Compare two string objects, returning -1, 0 or 1 respectively if a < b, a == b, a > b. Strings are compared byte by byte as two binary blobs without any encoding care / collation attempt.
RedisModule_StringAppendBuffer
int RedisModule_StringAppendBuffer(RedisModuleCtx *ctx,
RedisModuleString *str,
const char *buf,
size_t len);
Available since: 4.0.0
Append the specified buffer to the string ‘str’. The string must be a
string created by the user that is referenced only a single time, otherwise
REDISMODULE_ERR
is returned and the operation is not performed.
RedisModule_TrimStringAllocation
void RedisModule_TrimStringAllocation(RedisModuleString *str);
Trim possible excess memory allocated for a RedisModuleString
.
Sometimes a RedisModuleString
may have more memory allocated for
it than required, typically for argv arguments that were constructed
from network buffers. This function optimizes such strings by reallocating
their memory, which is useful for strings that are not short lived but
retained for an extended duration.
This operation is not thread safe and should only be called when no concurrent access to the string is guaranteed. Using it for an argv string in a module command before the string is potentially available to other threads is generally safe.
Currently, Redis may also automatically trim retained strings when a module command returns. However, doing this explicitly should still be a preferred option:
- Future versions of Redis may abandon auto-trimming.
- Auto-trimming as currently implemented is not thread safe. A background thread manipulating a recently retained string may end up in a race condition with the auto-trim, which could result with data corruption.
Reply APIs
These functions are used for sending replies to the client.
Most functions always return REDISMODULE_OK
so you can use it with
‘return’ in order to return from the command implementation with:
if (... some condition ...)
return RedisModule_ReplyWithLongLong(ctx,mycount);
Reply with collection functions
After starting a collection reply, the module must make calls to other
ReplyWith*
style functions in order to emit the elements of the collection.
Collection types include: Array, Map, Set and Attribute.
When producing collections with a number of elements that is not known
beforehand, the function can be called with a special flag
REDISMODULE_POSTPONED_LEN
(REDISMODULE_POSTPONED_ARRAY_LEN
in the past),
and the actual number of elements can be later set with RedisModule_ReplySet
*Length()
call (which will set the latest “open” count if there are multiple ones).
RedisModule_WrongArity
int RedisModule_WrongArity(RedisModuleCtx *ctx);
Available since: 4.0.0
Send an error about the number of arguments given to the command,
citing the command name in the error message. Returns REDISMODULE_OK
.
Example:
if (argc != 3) return RedisModule_WrongArity(ctx);
RedisModule_ReplyWithLongLong
int RedisModule_ReplyWithLongLong(RedisModuleCtx *ctx, long long ll);
Available since: 4.0.0
Send an integer reply to the client, with the specified long long value.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithError
int RedisModule_ReplyWithError(RedisModuleCtx *ctx, const char *err);
Available since: 4.0.0
Reply with the error ‘err’.
Note that ‘err’ must contain all the error, including the initial error code. The function only provides the initial “-”, so the usage is, for example:
RedisModule_ReplyWithError(ctx,"ERR Wrong Type");
and not just:
RedisModule_ReplyWithError(ctx,"Wrong Type");
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithSimpleString
int RedisModule_ReplyWithSimpleString(RedisModuleCtx *ctx, const char *msg);
Available since: 4.0.0
Reply with a simple string (+... \r\n
in RESP protocol). This replies
are suitable only when sending a small non-binary string with small
overhead, like “OK” or similar replies.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithArray
int RedisModule_ReplyWithArray(RedisModuleCtx *ctx, long len);
Available since: 4.0.0
Reply with an array type of ‘len’ elements.
After starting an array reply, the module must make len
calls to other
ReplyWith*
style functions in order to emit the elements of the array.
See Reply APIs section for more details.
Use RedisModule_ReplySetArrayLength()
to set deferred length.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithMap
int RedisModule_ReplyWithMap(RedisModuleCtx *ctx, long len);
Reply with a RESP3 Map type of ‘len’ pairs. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
After starting a map reply, the module must make len*2
calls to other
ReplyWith*
style functions in order to emit the elements of the map.
See Reply APIs section for more details.
If the connected client is using RESP2, the reply will be converted to a flat array.
Use RedisModule_ReplySetMapLength()
to set deferred length.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithSet
int RedisModule_ReplyWithSet(RedisModuleCtx *ctx, long len);
Reply with a RESP3 Set type of ‘len’ elements. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
After starting a set reply, the module must make len
calls to other
ReplyWith*
style functions in order to emit the elements of the set.
See Reply APIs section for more details.
If the connected client is using RESP2, the reply will be converted to an array type.
Use RedisModule_ReplySetSetLength()
to set deferred length.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithAttribute
int RedisModule_ReplyWithAttribute(RedisModuleCtx *ctx, long len);
Add attributes (metadata) to the reply. Should be done before adding the actual reply. see https://github.com/antirez/RESP3/blob/master/spec.md#attribute-type
After starting an attributes reply, the module must make len*2
calls to other
ReplyWith*
style functions in order to emit the elements of the attribtute map.
See Reply APIs section for more details.
Use RedisModule_ReplySetAttributeLength()
to set deferred length.
Not supported by RESP2 and will return REDISMODULE_ERR
, otherwise
the function always returns REDISMODULE_OK
.
RedisModule_ReplyWithNullArray
int RedisModule_ReplyWithNullArray(RedisModuleCtx *ctx);
Available since: 6.0.0
Reply to the client with a null array, simply null in RESP3, null array in RESP2.
Note: In RESP3 there’s no difference between Null reply and
NullArray reply, so to prevent ambiguity it’s better to avoid
using this API and use RedisModule_ReplyWithNull
instead.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithEmptyArray
int RedisModule_ReplyWithEmptyArray(RedisModuleCtx *ctx);
Available since: 6.0.0
Reply to the client with an empty array.
The function always returns REDISMODULE_OK
.
RedisModule_ReplySetArrayLength
void RedisModule_ReplySetArrayLength(RedisModuleCtx *ctx, long len);
Available since: 4.0.0
When RedisModule_ReplyWithArray()
is used with the argument
REDISMODULE_POSTPONED_LEN
, because we don’t know beforehand the number
of items we are going to output as elements of the array, this function
will take care to set the array length.
Since it is possible to have multiple array replies pending with unknown length, this function guarantees to always set the latest array length that was created in a postponed way.
For example in order to output an array like [1,[10,20,30]] we could write:
RedisModule_ReplyWithArray(ctx,REDISMODULE_POSTPONED_LEN);
RedisModule_ReplyWithLongLong(ctx,1);
RedisModule_ReplyWithArray(ctx,REDISMODULE_POSTPONED_LEN);
RedisModule_ReplyWithLongLong(ctx,10);
RedisModule_ReplyWithLongLong(ctx,20);
RedisModule_ReplyWithLongLong(ctx,30);
RedisModule_ReplySetArrayLength(ctx,3); // Set len of 10,20,30 array.
RedisModule_ReplySetArrayLength(ctx,2); // Set len of top array
Note that in the above example there is no reason to postpone the array length, since we produce a fixed number of elements, but in the practice the code may use an iterator or other ways of creating the output so that is not easy to calculate in advance the number of elements.
RedisModule_ReplySetMapLength
void RedisModule_ReplySetMapLength(RedisModuleCtx *ctx, long len);
Very similar to RedisModule_ReplySetArrayLength
except len
should
exactly half of the number of ReplyWith*
functions called in the
context of the map.
Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
RedisModule_ReplySetSetLength
void RedisModule_ReplySetSetLength(RedisModuleCtx *ctx, long len);
Very similar to RedisModule_ReplySetArrayLength
Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
RedisModule_ReplySetAttributeLength
void RedisModule_ReplySetAttributeLength(RedisModuleCtx *ctx, long len);
Very similar to RedisModule_ReplySetMapLength
Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
RedisModule_ReplyWithStringBuffer
int RedisModule_ReplyWithStringBuffer(RedisModuleCtx *ctx,
const char *buf,
size_t len);
Available since: 4.0.0
Reply with a bulk string, taking in input a C buffer pointer and length.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithCString
int RedisModule_ReplyWithCString(RedisModuleCtx *ctx, const char *buf);
Available since: 5.0.6
Reply with a bulk string, taking in input a C buffer pointer that is assumed to be null-terminated.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithString
int RedisModule_ReplyWithString(RedisModuleCtx *ctx, RedisModuleString *str);
Available since: 4.0.0
Reply with a bulk string, taking in input a RedisModuleString
object.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithEmptyString
int RedisModule_ReplyWithEmptyString(RedisModuleCtx *ctx);
Available since: 6.0.0
Reply with an empty string.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithVerbatimStringType
int RedisModule_ReplyWithVerbatimStringType(RedisModuleCtx *ctx,
const char *buf,
size_t len,
const char *ext);
Reply with a binary safe string, which should not be escaped or filtered taking in input a C buffer pointer, length and a 3 character type/extension.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithVerbatimString
int RedisModule_ReplyWithVerbatimString(RedisModuleCtx *ctx,
const char *buf,
size_t len);
Available since: 6.0.0
Reply with a binary safe string, which should not be escaped or filtered taking in input a C buffer pointer and length.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithNull
int RedisModule_ReplyWithNull(RedisModuleCtx *ctx);
Available since: 4.0.0
Reply to the client with a NULL.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithBool
int RedisModule_ReplyWithBool(RedisModuleCtx *ctx, int b);
Reply with a RESP3 Boolean type. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
In RESP3, this is boolean type In RESP2, it’s a string response of “1” and “0” for true and false respectively.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithCallReply
int RedisModule_ReplyWithCallReply(RedisModuleCtx *ctx,
RedisModuleCallReply *reply);
Available since: 4.0.0
Reply exactly what a Redis command returned us with RedisModule_Call()
.
This function is useful when we use RedisModule_Call()
in order to
execute some command, as we want to reply to the client exactly the
same reply we obtained by the command.
Return:
REDISMODULE_OK
on success.REDISMODULE_ERR
if the given reply is in RESP3 format but the client expects RESP2. In case of an error, it’s the module writer responsibility to translate the reply to RESP2 (or handle it differently by returning an error). Notice that for module writer convenience, it is possible to pass0
as a parameter to the fmt argument ofRM_Call
so that theRedisModuleCallReply
will return in the same protocol (RESP2 or RESP3) as set in the current client’s context.
RedisModule_ReplyWithDouble
int RedisModule_ReplyWithDouble(RedisModuleCtx *ctx, double d);
Available since: 4.0.0
Reply with a RESP3 Double type. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
Send a string reply obtained converting the double ’d' into a bulk string.
This function is basically equivalent to converting a double into
a string into a C buffer, and then calling the function
RedisModule_ReplyWithStringBuffer()
with the buffer and length.
In RESP3 the string is tagged as a double, while in RESP2 it’s just a plain string that the user will have to parse.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithBigNumber
int RedisModule_ReplyWithBigNumber(RedisModuleCtx *ctx,
const char *bignum,
size_t len);
Reply with a RESP3 BigNumber type. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.
In RESP3, this is a string of length len
that is tagged as a BigNumber,
however, it’s up to the caller to ensure that it’s a valid BigNumber.
In RESP2, this is just a plain bulk string response.
The function always returns REDISMODULE_OK
.
RedisModule_ReplyWithLongDouble
int RedisModule_ReplyWithLongDouble(RedisModuleCtx *ctx, long double ld);
Available since: 6.0.0
Send a string reply obtained converting the long double ‘ld’ into a bulk
string. This function is basically equivalent to converting a long double
into a string into a C buffer, and then calling the function
RedisModule_ReplyWithStringBuffer()
with the buffer and length.
The double string uses human readable formatting (see
addReplyHumanLongDouble
in networking.c).
The function always returns REDISMODULE_OK
.
Commands replication API
RedisModule_Replicate
int RedisModule_Replicate(RedisModuleCtx *ctx,
const char *cmdname,
const char *fmt,
...);
Available since: 4.0.0
Replicate the specified command and arguments to slaves and AOF, as effect of execution of the calling command implementation.
The replicated commands are always wrapped into the MULTI/EXEC that
contains all the commands replicated in a given module command
execution. However the commands replicated with RedisModule_Call()
are the first items, the ones replicated with RedisModule_Replicate()
will all follow before the EXEC.
Modules should try to use one interface or the other.
This command follows exactly the same interface of RedisModule_Call()
,
so a set of format specifiers must be passed, followed by arguments
matching the provided format specifiers.
Please refer to RedisModule_Call()
for more information.
Using the special “A” and “R” modifiers, the caller can exclude either the AOF or the replicas from the propagation of the specified command. Otherwise, by default, the command will be propagated in both channels.
Note about calling this function from a thread safe context:
Normally when you call this function from the callback implementing a module command, or any other callback provided by the Redis Module API, Redis will accumulate all the calls to this function in the context of the callback, and will propagate all the commands wrapped in a MULTI/EXEC transaction. However when calling this function from a threaded safe context that can live an undefined amount of time, and can be locked/unlocked in at will, the behavior is different: MULTI/EXEC wrapper is not emitted and the command specified is inserted in the AOF and replication stream immediately.
Return value
The command returns REDISMODULE_ERR
if the format specifiers are invalid
or the command name does not belong to a known command.
RedisModule_ReplicateVerbatim
int RedisModule_ReplicateVerbatim(RedisModuleCtx *ctx);
Available since: 4.0.0
This function will replicate the command exactly as it was invoked by the client. Note that this function will not wrap the command into a MULTI/EXEC stanza, so it should not be mixed with other replication commands.
Basically this form of replication is useful when you want to propagate the command to the slaves and AOF file exactly as it was called, since the command can just be re-executed to deterministically re-create the new state starting from the old one.
The function always returns REDISMODULE_OK
.
DB and Key APIs – Generic API
RedisModule_GetClientId
unsigned long long RedisModule_GetClientId(RedisModuleCtx *ctx);
Available since: 4.0.0
Return the ID of the current client calling the currently active module command. The returned ID has a few guarantees:
- The ID is different for each different client, so if the same client executes a module command multiple times, it can be recognized as having the same ID, otherwise the ID will be different.
- The ID increases monotonically. Clients connecting to the server later are guaranteed to get IDs greater than any past ID previously seen.
Valid IDs are from 1 to 2^64 - 1. If 0 is returned it means there is no way to fetch the ID in the context the function was currently called.
After obtaining the ID, it is possible to check if the command execution is actually happening in the context of AOF loading, using this macro:
if (RedisModule_IsAOFClient(RedisModule_GetClientId(ctx)) {
// Handle it differently.
}
RedisModule_GetClientUserNameById
RedisModuleString *RedisModule_GetClientUserNameById(RedisModuleCtx *ctx,
uint64_t id);
Available since: 6.2.1
Return the ACL user name used by the client with the specified client ID.
Client ID can be obtained with RedisModule_GetClientId()
API. If the client does not
exist, NULL is returned and errno is set to ENOENT. If the client isn’t
using an ACL user, NULL is returned and errno is set to ENOTSUP
RedisModule_GetClientInfoById
int RedisModule_GetClientInfoById(void *ci, uint64_t id);
Available since: 6.0.0
Return information about the client with the specified ID (that was
previously obtained via the RedisModule_GetClientId()
API). If the
client exists, REDISMODULE_OK
is returned, otherwise REDISMODULE_ERR
is returned.
When the client exist and the ci
pointer is not NULL, but points to
a structure of type RedisModuleClientInfo
, previously initialized with
the correct REDISMODULE_CLIENTINFO_INITIALIZER
, the structure is populated
with the following fields:
uint64_t flags; // REDISMODULE_CLIENTINFO_FLAG_*
uint64_t id; // Client ID
char addr[46]; // IPv4 or IPv6 address.
uint16_t port; // TCP port.
uint16_t db; // Selected DB.
Note: the client ID is useless in the context of this call, since we already know, however the same structure could be used in other contexts where we don’t know the client ID, yet the same structure is returned.
With flags having the following meaning:
REDISMODULE_CLIENTINFO_FLAG_SSL Client using SSL connection.
REDISMODULE_CLIENTINFO_FLAG_PUBSUB Client in Pub/Sub mode.
REDISMODULE_CLIENTINFO_FLAG_BLOCKED Client blocked in command.
REDISMODULE_CLIENTINFO_FLAG_TRACKING Client with keys tracking on.
REDISMODULE_CLIENTINFO_FLAG_UNIXSOCKET Client using unix domain socket.
REDISMODULE_CLIENTINFO_FLAG_MULTI Client in MULTI state.
However passing NULL is a way to just check if the client exists in case we are not interested in any additional information.
This is the correct usage when we want the client info structure returned:
RedisModuleClientInfo ci = REDISMODULE_CLIENTINFO_INITIALIZER;
int retval = RedisModule_GetClientInfoById(&ci,client_id);
if (retval == REDISMODULE_OK) {
printf("Address: %s\n", ci.addr);
}
RedisModule_PublishMessage
int RedisModule_PublishMessage(RedisModuleCtx *ctx,
RedisModuleString *channel,
RedisModuleString *message);
Available since: 6.0.0
Publish a message to subscribers (see PUBLISH command).
RedisModule_GetSelectedDb
int RedisModule_GetSelectedDb(RedisModuleCtx *ctx);
Available since: 4.0.0
Return the currently selected DB.
RedisModule_GetContextFlags
int RedisModule_GetContextFlags(RedisModuleCtx *ctx);
Available since: 4.0.3
Return the current context’s flags. The flags provide information on the current request context (whether the client is a Lua script or in a MULTI), and about the Redis instance in general, i.e replication and persistence.
It is possible to call this function even with a NULL context, however in this case the following flags will not be reported:
- LUA, MULTI, REPLICATED, DIRTY (see below for more info).
Available flags and their meaning:
-
REDISMODULE_CTX_FLAGS_LUA
: The command is running in a Lua script -
REDISMODULE_CTX_FLAGS_MULTI
: The command is running inside a transaction -
REDISMODULE_CTX_FLAGS_REPLICATED
: The command was sent over the replication link by the MASTER -
REDISMODULE_CTX_FLAGS_MASTER
: The Redis instance is a master -
REDISMODULE_CTX_FLAGS_SLAVE
: The Redis instance is a slave -
REDISMODULE_CTX_FLAGS_READONLY
: The Redis instance is read-only -
REDISMODULE_CTX_FLAGS_CLUSTER
: The Redis instance is in cluster mode -
REDISMODULE_CTX_FLAGS_AOF
: The Redis instance has AOF enabled -
REDISMODULE_CTX_FLAGS_RDB
: The instance has RDB enabled -
REDISMODULE_CTX_FLAGS_MAXMEMORY
: The instance has Maxmemory set -
REDISMODULE_CTX_FLAGS_EVICT
: Maxmemory is set and has an eviction policy that may delete keys -
REDISMODULE_CTX_FLAGS_OOM
: Redis is out of memory according to the maxmemory setting. -
REDISMODULE_CTX_FLAGS_OOM_WARNING
: Less than 25% of memory remains before reaching the maxmemory level. -
REDISMODULE_CTX_FLAGS_LOADING
: Server is loading RDB/AOF -
REDISMODULE_CTX_FLAGS_REPLICA_IS_STALE
: No active link with the master. -
REDISMODULE_CTX_FLAGS_REPLICA_IS_CONNECTING
: The replica is trying to connect with the master. -
REDISMODULE_CTX_FLAGS_REPLICA_IS_TRANSFERRING
: Master -> Replica RDB transfer is in progress. -
REDISMODULE_CTX_FLAGS_REPLICA_IS_ONLINE
: The replica has an active link with its master. This is the contrary of STALE state. -
REDISMODULE_CTX_FLAGS_ACTIVE_CHILD
: There is currently some background process active (RDB, AUX or module). -
REDISMODULE_CTX_FLAGS_MULTI_DIRTY
: The next EXEC will fail due to dirty CAS (touched keys). -
REDISMODULE_CTX_FLAGS_IS_CHILD
: Redis is currently running inside background child process. -
REDISMODULE_CTX_FLAGS_RESP3
: Indicate the that client attached to this context is using RESP3.
RedisModule_AvoidReplicaTraffic
int RedisModule_AvoidReplicaTraffic();
Available since: 6.0.0
Returns true if a client sent the CLIENT PAUSE command to the server or if Redis Cluster does a manual failover, pausing the clients. This is needed when we have a master with replicas, and want to write, without adding further data to the replication channel, that the replicas replication offset, match the one of the master. When this happens, it is safe to failover the master without data loss.
However modules may generate traffic by calling RedisModule_Call()
with
the “!” flag, or by calling RedisModule_Replicate()
, in a context outside
commands execution, for instance in timeout callbacks, threads safe
contexts, and so forth. When modules will generate too much traffic, it
will be hard for the master and replicas offset to match, because there
is more data to send in the replication channel.
So modules may want to try to avoid very heavy background work that has the effect of creating data to the replication channel, when this function returns true. This is mostly useful for modules that have background garbage collection tasks, or that do writes and replicate such writes periodically in timer callbacks or other periodic callbacks.
RedisModule_SelectDb
int RedisModule_SelectDb(RedisModuleCtx *ctx, int newid);
Available since: 4.0.0
Change the currently selected DB. Returns an error if the id is out of range.
Note that the client will retain the currently selected DB even after the Redis command implemented by the module calling this function returns.
If the module command wishes to change something in a different DB and
returns back to the original one, it should call RedisModule_GetSelectedDb()
before in order to restore the old DB number before returning.
RedisModule_KeyExists
int RedisModule_KeyExists(RedisModuleCtx *ctx, robj *keyname);
Check if a key exists, without affecting its last access time.
This is equivalent to calling RedisModule_OpenKey
with the mode REDISMODULE_READ
|
REDISMODULE_OPEN_KEY_NOTOUCH
, then checking if NULL was returned and, if not,
calling RedisModule_CloseKey
on the opened key.
RedisModule_OpenKey
void *RedisModule_OpenKey(RedisModuleCtx *ctx, robj *keyname, int mode);
Available since: 4.0.0
Return an handle representing a Redis key, so that it is possible to call other APIs with the key handle as argument to perform operations on the key.
The return value is the handle representing the key, that must be
closed with RedisModule_CloseKey()
.
If the key does not exist and WRITE mode is requested, the handle
is still returned, since it is possible to perform operations on
a yet not existing key (that will be created, for example, after
a list push operation). If the mode is just READ instead, and the
key does not exist, NULL is returned. However it is still safe to
call RedisModule_CloseKey()
and RedisModule_KeyType()
on a NULL
value.
RedisModule_CloseKey
void RedisModule_CloseKey(RedisModuleKey *key);
Available since: 4.0.0
Close a key handle.
RedisModule_KeyType
int RedisModule_KeyType(RedisModuleKey *key);
Available since: 4.0.0
Return the type of the key. If the key pointer is NULL then
REDISMODULE_KEYTYPE_EMPTY
is returned.
RedisModule_ValueLength
size_t RedisModule_ValueLength(RedisModuleKey *key);
Available since: 4.0.0
Return the length of the value associated with the key. For strings this is the length of the string. For all the other types is the number of elements (just counting keys for hashes).
If the key pointer is NULL or the key is empty, zero is returned.
RedisModule_DeleteKey
int RedisModule_DeleteKey(RedisModuleKey *key);
Available since: 4.0.0
If the key is open for writing, remove it, and setup the key to
accept new writes as an empty key (that will be created on demand).
On success REDISMODULE_OK
is returned. If the key is not open for
writing REDISMODULE_ERR
is returned.
RedisModule_UnlinkKey
int RedisModule_UnlinkKey(RedisModuleKey *key);
Available since: 4.0.7
If the key is open for writing, unlink it (that is delete it in a
non-blocking way, not reclaiming memory immediately) and setup the key to
accept new writes as an empty key (that will be created on demand).
On success REDISMODULE_OK
is returned. If the key is not open for
writing REDISMODULE_ERR
is returned.
RedisModule_GetExpire
mstime_t RedisModule_GetExpire(RedisModuleKey *key);
Available since: 4.0.0
Return the key expire value, as milliseconds of remaining TTL.
If no TTL is associated with the key or if the key is empty,
REDISMODULE_NO_EXPIRE
is returned.
RedisModule_SetExpire
int RedisModule_SetExpire(RedisModuleKey *key, mstime_t expire);
Available since: 4.0.0
Set a new expire for the key. If the special expire
REDISMODULE_NO_EXPIRE
is set, the expire is cancelled if there was
one (the same as the PERSIST command).
Note that the expire must be provided as a positive integer representing the number of milliseconds of TTL the key should have.
The function returns REDISMODULE_OK
on success or REDISMODULE_ERR
if
the key was not open for writing or is an empty key.
RedisModule_GetAbsExpire
mstime_t RedisModule_GetAbsExpire(RedisModuleKey *key);
Available since: 6.2.2
Return the key expire value, as absolute Unix timestamp.
If no TTL is associated with the key or if the key is empty,
REDISMODULE_NO_EXPIRE
is returned.
RedisModule_SetAbsExpire
int RedisModule_SetAbsExpire(RedisModuleKey *key, mstime_t expire);
Available since: 6.2.2
Set a new expire for the key. If the special expire
REDISMODULE_NO_EXPIRE
is set, the expire is cancelled if there was
one (the same as the PERSIST command).
Note that the expire must be provided as a positive integer representing the absolute Unix timestamp the key should have.
The function returns REDISMODULE_OK
on success or REDISMODULE_ERR
if
the key was not open for writing or is an empty key.
RedisModule_ResetDataset
void RedisModule_ResetDataset(int restart_aof, int async);
Available since: 6.0.0
Performs similar operation to FLUSHALL, and optionally start a new AOF file (if enabled)
If restart_aof
is true, you must make sure the command that triggered this call is not
propagated to the AOF file.
When async is set to true, db contents will be freed by a background thread.
RedisModule_DbSize
unsigned long long RedisModule_DbSize(RedisModuleCtx *ctx);
Available since: 6.0.0
Returns the number of keys in the current db.
RedisModule_RandomKey
RedisModuleString *RedisModule_RandomKey(RedisModuleCtx *ctx);
Available since: 6.0.0
Returns a name of a random key, or NULL if current db is empty.
RedisModule_GetKeyNameFromOptCtx
const RedisModuleString *RedisModule_GetKeyNameFromOptCtx(RedisModuleKeyOptCtx *ctx);
Returns the name of the key currently being processed.
RedisModule_GetToKeyNameFromOptCtx
const RedisModuleString *RedisModule_GetToKeyNameFromOptCtx(RedisModuleKeyOptCtx *ctx);
Returns the name of the target key currently being processed.
RedisModule_GetDbIdFromOptCtx
int RedisModule_GetDbIdFromOptCtx(RedisModuleKeyOptCtx *ctx);
Returns the dbid currently being processed.
RedisModule_GetToDbIdFromOptCtx
int RedisModule_GetToDbIdFromOptCtx(RedisModuleKeyOptCtx *ctx);
Returns the target dbid currently being processed.
Key API for String type
See also RedisModule_ValueLength()
, which returns the length of a string.
RedisModule_StringSet
int RedisModule_StringSet(RedisModuleKey *key, RedisModuleString *str);
Available since: 4.0.0
If the key is open for writing, set the specified string ‘str’ as the
value of the key, deleting the old value if any.
On success REDISMODULE_OK
is returned. If the key is not open for
writing or there is an active iterator, REDISMODULE_ERR
is returned.
RedisModule_StringDMA
char *RedisModule_StringDMA(RedisModuleKey *key, size_t *len, int mode);
Available since: 4.0.0
Prepare the key associated string value for DMA access, and returns a pointer and size (by reference), that the user can use to read or modify the string in-place accessing it directly via pointer.
The ‘mode’ is composed by bitwise OR-ing the following flags:
REDISMODULE_READ -- Read access
REDISMODULE_WRITE -- Write access
If the DMA is not requested for writing, the pointer returned should only be accessed in a read-only fashion.
On error (wrong type) NULL is returned.
DMA access rules:
-
No other key writing function should be called since the moment the pointer is obtained, for all the time we want to use DMA access to read or modify the string.
-
Each time
RedisModule_StringTruncate()
is called, to continue with the DMA access,RedisModule_StringDMA()
should be called again to re-obtain a new pointer and length. -
If the returned pointer is not NULL, but the length is zero, no byte can be touched (the string is empty, or the key itself is empty) so a
RedisModule_StringTruncate()
call should be used if there is to enlarge the string, and later call StringDMA() again to get the pointer.
RedisModule_StringTruncate
int RedisModule_StringTruncate(RedisModuleKey *key, size_t newlen);
Available since: 4.0.0
If the key is open for writing and is of string type, resize it, padding with zero bytes if the new length is greater than the old one.
After this call, RedisModule_StringDMA()
must be called again to continue
DMA access with the new pointer.
The function returns REDISMODULE_OK
on success, and REDISMODULE_ERR
on
error, that is, the key is not open for writing, is not a string
or resizing for more than 512 MB is requested.
If the key is empty, a string key is created with the new string value unless the new length value requested is zero.
Key API for List type
Many of the list functions access elements by index. Since a list is in essence a doubly-linked list, accessing elements by index is generally an O(N) operation. However, if elements are accessed sequentially or with indices close together, the functions are optimized to seek the index from the previous index, rather than seeking from the ends of the list.
This enables iteration to be done efficiently using a simple for loop:
long n = RM_ValueLength(key);
for (long i = 0; i < n; i++) {
RedisModuleString *elem = RedisModule_ListGet(key, i);
// Do stuff...
}
Note that after modifying a list using RedisModule_ListPop
, RedisModule_ListSet
or
RedisModule_ListInsert
, the internal iterator is invalidated so the next operation
will require a linear seek.
Modifying a list in any another way, for examle using RedisModule_Call()
, while a key
is open will confuse the internal iterator and may cause trouble if the key
is used after such modifications. The key must be reopened in this case.
See also RedisModule_ValueLength()
, which returns the length of a list.
RedisModule_ListPush
int RedisModule_ListPush(RedisModuleKey *key,
int where,
RedisModuleString *ele);
Available since: 4.0.0
Push an element into a list, on head or tail depending on ‘where’ argument
(REDISMODULE_LIST_HEAD
or REDISMODULE_LIST_TAIL
). If the key refers to an
empty key opened for writing, the key is created. On success, REDISMODULE_OK
is returned. On failure, REDISMODULE_ERR
is returned and errno
is set as
follows:
- EINVAL if key or ele is NULL.
- ENOTSUP if the key is of another type than list.
- EBADF if the key is not opened for writing.
Note: Before Redis 7.0, errno
was not set by this function.
RedisModule_ListPop
RedisModuleString *RedisModule_ListPop(RedisModuleKey *key, int where);
Available since: 4.0.0
Pop an element from the list, and returns it as a module string object
that the user should be free with RedisModule_FreeString()
or by enabling
automatic memory. The where
argument specifies if the element should be
popped from the beginning or the end of the list (REDISMODULE_LIST_HEAD
or
REDISMODULE_LIST_TAIL
). On failure, the command returns NULL and sets
errno
as follows:
- EINVAL if key is NULL.
- ENOTSUP if the key is empty or of another type than list.
- EBADF if the key is not opened for writing.
Note: Before Redis 7.0, errno
was not set by this function.
RedisModule_ListGet
RedisModuleString *RedisModule_ListGet(RedisModuleKey *key, long index);
Returns the element at index index
in the list stored at key
, like the
LINDEX command. The element should be free’d using RedisModule_FreeString()
or using
automatic memory management.
The index is zero-based, so 0 means the first element, 1 the second element and so on. Negative indices can be used to designate elements starting at the tail of the list. Here, -1 means the last element, -2 means the penultimate and so forth.
When no value is found at the given key and index, NULL is returned and
errno
is set as follows:
- EINVAL if key is NULL.
- ENOTSUP if the key is not a list.
- EBADF if the key is not opened for reading.
- EDOM if the index is not a valid index in the list.
RedisModule_ListSet
int RedisModule_ListSet(RedisModuleKey *key,
long index,
RedisModuleString *value);
Replaces the element at index index
in the list stored at key
.
The index is zero-based, so 0 means the first element, 1 the second element and so on. Negative indices can be used to designate elements starting at the tail of the list. Here, -1 means the last element, -2 means the penultimate and so forth.
On success, REDISMODULE_OK
is returned. On failure, REDISMODULE_ERR
is
returned and errno
is set as follows:
- EINVAL if key or value is NULL.
- ENOTSUP if the key is not a list.
- EBADF if the key is not opened for writing.
- EDOM if the index is not a valid index in the list.
RedisModule_ListInsert
int RedisModule_ListInsert(RedisModuleKey *key,
long index,
RedisModuleString *value);
Inserts an element at the given index.
The index is zero-based, so 0 means the first element, 1 the second element and so on. Negative indices can be used to designate elements starting at the tail of the list. Here, -1 means the last element, -2 means the penultimate and so forth. The index is the element’s index after inserting it.
On success, REDISMODULE_OK
is returned. On failure, REDISMODULE_ERR
is
returned and errno
is set as follows:
- EINVAL if key or value is NULL.
- ENOTSUP if the key of another type than list.
- EBADF if the key is not opened for writing.
- EDOM if the index is not a valid index in the list.
RedisModule_ListDelete
int RedisModule_ListDelete(RedisModuleKey *key, long index);
Removes an element at the given index. The index is 0-based. A negative index can also be used, counting from the end of the list.
On success, REDISMODULE_OK
is returned. On failure, REDISMODULE_ERR
is
returned and errno
is set as follows:
- EINVAL if key or value is NULL.
- ENOTSUP if the key is not a list.
- EBADF if the key is not opened for writing.
- EDOM if the index is not a valid index in the list.
Key API for Sorted Set type
See also RedisModule_ValueLength()
, which returns the length of a sorted set.
RedisModule_ZsetAdd
int RedisModule_ZsetAdd(RedisModuleKey *key,
double score,
RedisModuleString *ele,
int *flagsptr);
Available since: 4.0.0
Add a new element into a sorted set, with the specified ‘score’. If the element already exists, the score is updated.
A new sorted set is created at value if the key is an empty open key setup for writing.
Additional flags can be passed to the function via a pointer, the flags are both used to receive input and to communicate state when the function returns. ‘flagsptr’ can be NULL if no special flags are used.
The input flags are:
REDISMODULE_ZADD_XX: Element must already exist. Do nothing otherwise.
REDISMODULE_ZADD_NX: Element must not exist. Do nothing otherwise.
REDISMODULE_ZADD_GT: If element exists, new score must be greater than the current score.
Do nothing otherwise. Can optionally be combined with XX.
REDISMODULE_ZADD_LT: If element exists, new score must be less than the current score.
Do nothing otherwise. Can optionally be combined with XX.
The output flags are:
REDISMODULE_ZADD_ADDED: The new element was added to the sorted set.
REDISMODULE_ZADD_UPDATED: The score of the element was updated.
REDISMODULE_ZADD_NOP: No operation was performed because XX or NX flags.
On success the function returns REDISMODULE_OK
. On the following errors
REDISMODULE_ERR
is returned:
- The key was not opened for writing.
- The key is of the wrong type.
- ‘score’ double value is not a number (NaN).
RedisModule_ZsetIncrby
int RedisModule_ZsetIncrby(RedisModuleKey *key,
double score,
RedisModuleString *ele,
int *flagsptr,
double *newscore);
Available since: 4.0.0
This function works exactly like RedisModule_ZsetAdd()
, but instead of setting
a new score, the score of the existing element is incremented, or if the
element does not already exist, it is added assuming the old score was
zero.
The input and output flags, and the return value, have the same exact
meaning, with the only difference that this function will return
REDISMODULE_ERR
even when ‘score’ is a valid double number, but adding it
to the existing score results into a NaN (not a number) condition.
This function has an additional field ‘newscore’, if not NULL is filled with the new score of the element after the increment, if no error is returned.
RedisModule_ZsetRem
int RedisModule_ZsetRem(RedisModuleKey *key,
RedisModuleString *ele,
int *deleted);
Available since: 4.0.0
Remove the specified element from the sorted set.
The function returns REDISMODULE_OK
on success, and REDISMODULE_ERR
on one of the following conditions:
- The key was not opened for writing.
- The key is of the wrong type.
The return value does NOT indicate the fact the element was really removed (since it existed) or not, just if the function was executed with success.
In order to know if the element was removed, the additional argument ‘deleted’ must be passed, that populates the integer by reference setting it to 1 or 0 depending on the outcome of the operation. The ‘deleted’ argument can be NULL if the caller is not interested to know if the element was really removed.
Empty keys will be handled correctly by doing nothing.
RedisModule_ZsetScore
int RedisModule_ZsetScore(RedisModuleKey *key,
RedisModuleString *ele,
double *score);
Available since: 4.0.0
On success retrieve the double score associated at the sorted set element
‘ele’ and returns REDISMODULE_OK
. Otherwise REDISMODULE_ERR
is returned
to signal one of the following conditions:
- There is no such element ‘ele’ in the sorted set.
- The key is not a sorted set.
- The key is an open empty key.
Key API for Sorted Set iterator
RedisModule_ZsetRangeStop
void RedisModule_ZsetRangeStop(RedisModuleKey *key);
Available since: 4.0.0
Stop a sorted set iteration.
RedisModule_ZsetRangeEndReached
int RedisModule_ZsetRangeEndReached(RedisModuleKey *key);
Available since: 4.0.0
Return the “End of range” flag value to signal the end of the iteration.
RedisModule_ZsetFirstInScoreRange
int RedisModule_ZsetFirstInScoreRange(RedisModuleKey *key,
double min,
double max,
int minex,
int maxex);
Available since: 4.0.0
Setup a sorted set iterator seeking the first element in the specified
range. Returns REDISMODULE_OK
if the iterator was correctly initialized
otherwise REDISMODULE_ERR
is returned in the following conditions:
- The value stored at key is not a sorted set or the key is empty.
The range is specified according to the two double values ‘min’ and ‘max’. Both can be infinite using the following two macros:
REDISMODULE_POSITIVE_INFINITE
for positive infinite valueREDISMODULE_NEGATIVE_INFINITE
for negative infinite value
‘minex’ and ‘maxex’ parameters, if true, respectively setup a range where the min and max value are exclusive (not included) instead of inclusive.
RedisModule_ZsetLastInScoreRange
int RedisModule_ZsetLastInScoreRange(RedisModuleKey *key,
double min,
double max,
int minex,
int maxex);
Available since: 4.0.0
Exactly like RedisModule_ZsetFirstInScoreRange()
but the last element of
the range is selected for the start of the iteration instead.
RedisModule_ZsetFirstInLexRange
int RedisModule_ZsetFirstInLexRange(RedisModuleKey *key,
RedisModuleString *min,
RedisModuleString *max);
Available since: 4.0.0
Setup a sorted set iterator seeking the first element in the specified
lexicographical range. Returns REDISMODULE_OK
if the iterator was correctly
initialized otherwise REDISMODULE_ERR
is returned in the
following conditions:
- The value stored at key is not a sorted set or the key is empty.
- The lexicographical range ‘min’ and ‘max’ format is invalid.
‘min’ and ‘max’ should be provided as two RedisModuleString
objects
in the same format as the parameters passed to the ZRANGEBYLEX command.
The function does not take ownership of the objects, so they can be released
ASAP after the iterator is setup.
RedisModule_ZsetLastInLexRange
int RedisModule_ZsetLastInLexRange(RedisModuleKey *key,
RedisModuleString *min,
RedisModuleString *max);
Available since: 4.0.0
Exactly like RedisModule_ZsetFirstInLexRange()
but the last element of
the range is selected for the start of the iteration instead.
RedisModule_ZsetRangeCurrentElement
RedisModuleString *RedisModule_ZsetRangeCurrentElement(RedisModuleKey *key,
double *score);
Available since: 4.0.0
Return the current sorted set element of an active sorted set iterator or NULL if the range specified in the iterator does not include any element.
RedisModule_ZsetRangeNext
int RedisModule_ZsetRangeNext(RedisModuleKey *key);
Available since: 4.0.0
Go to the next element of the sorted set iterator. Returns 1 if there was a next element, 0 if we are already at the latest element or the range does not include any item at all.
RedisModule_ZsetRangePrev
int RedisModule_ZsetRangePrev(RedisModuleKey *key);
Available since: 4.0.0
Go to the previous element of the sorted set iterator. Returns 1 if there was a previous element, 0 if we are already at the first element or the range does not include any item at all.
Key API for Hash type
See also RedisModule_ValueLength()
, which returns the number of fields in a hash.
RedisModule_HashSet
int RedisModule_HashSet(RedisModuleKey *key, int flags, ...);
Available since: 4.0.0
Set the field of the specified hash field to the specified value. If the key is an empty key open for writing, it is created with an empty hash value, in order to set the specified field.
The function is variadic and the user must specify pairs of field
names and values, both as RedisModuleString
pointers (unless the
CFIELD option is set, see later). At the end of the field/value-ptr pairs,
NULL must be specified as last argument to signal the end of the arguments
in the variadic function.
Example to set the hash argv[1] to the value argv[2]:
RedisModule_HashSet(key,REDISMODULE_HASH_NONE,argv[1],argv[2],NULL);
The function can also be used in order to delete fields (if they exist)
by setting them to the specified value of REDISMODULE_HASH_DELETE
:
RedisModule_HashSet(key,REDISMODULE_HASH_NONE,argv[1],
REDISMODULE_HASH_DELETE,NULL);
The behavior of the command changes with the specified flags, that can be
set to REDISMODULE_HASH_NONE
if no special behavior is needed.
REDISMODULE_HASH_NX: The operation is performed only if the field was not
already existing in the hash.
REDISMODULE_HASH_XX: The operation is performed only if the field was
already existing, so that a new value could be
associated to an existing filed, but no new fields
are created.
REDISMODULE_HASH_CFIELDS: The field names passed are null terminated C
strings instead of RedisModuleString objects.
REDISMODULE_HASH_COUNT_ALL: Include the number of inserted fields in the
returned number, in addition to the number of
updated and deleted fields. (Added in Redis
6.2.)
Unless NX is specified, the command overwrites the old field value with the new one.
When using REDISMODULE_HASH_CFIELDS
, field names are reported using
normal C strings, so for example to delete the field “foo” the following
code can be used:
RedisModule_HashSet(key,REDISMODULE_HASH_CFIELDS,"foo",
REDISMODULE_HASH_DELETE,NULL);
Return value:
The number of fields existing in the hash prior to the call, which have been
updated (its old value has been replaced by a new value) or deleted. If the
flag REDISMODULE_HASH_COUNT_ALL
is set, inserted fields not previously
existing in the hash are also counted.
If the return value is zero, errno
is set (since Redis 6.2) as follows:
- EINVAL if any unknown flags are set or if key is NULL.
- ENOTSUP if the key is associated with a non Hash value.
- EBADF if the key was not opened for writing.
- ENOENT if no fields were counted as described under Return value above.
This is not actually an error. The return value can be zero if all fields
were just created and the
COUNT_ALL
flag was unset, or if changes were held back due to the NX and XX flags.
NOTICE: The return value semantics of this function are very different between Redis 6.2 and older versions. Modules that use it should determine the Redis version and handle it accordingly.
RedisModule_HashGet
int RedisModule_HashGet(RedisModuleKey *key, int flags, ...);
Available since: 4.0.0
Get fields from an hash value. This function is called using a variable
number of arguments, alternating a field name (as a RedisModuleString
pointer) with a pointer to a RedisModuleString
pointer, that is set to the
value of the field if the field exists, or NULL if the field does not exist.
At the end of the field/value-ptr pairs, NULL must be specified as last
argument to signal the end of the arguments in the variadic function.
This is an example usage:
RedisModuleString *first, *second;
RedisModule_HashGet(mykey,REDISMODULE_HASH_NONE,argv[1],&first,
argv[2],&second,NULL);
As with RedisModule_HashSet()
the behavior of the command can be specified
passing flags different than REDISMODULE_HASH_NONE
:
REDISMODULE_HASH_CFIELDS
: field names as null terminated C strings.
REDISMODULE_HASH_EXISTS
: instead of setting the value of the field
expecting a RedisModuleString
pointer to pointer, the function just
reports if the field exists or not and expects an integer pointer
as the second element of each pair.
Example of REDISMODULE_HASH_CFIELDS
:
RedisModuleString *username, *hashedpass;
RedisModule_HashGet(mykey,REDISMODULE_HASH_CFIELDS,"username",&username,"hp",&hashedpass, NULL);
Example of REDISMODULE_HASH_EXISTS
:
int exists;
RedisModule_HashGet(mykey,REDISMODULE_HASH_EXISTS,argv[1],&exists,NULL);
The function returns REDISMODULE_OK
on success and REDISMODULE_ERR
if
the key is not an hash value.
Memory management:
The returned RedisModuleString
objects should be released with
RedisModule_FreeString()
, or by enabling automatic memory management.
Key API for Stream type
For an introduction to streams, see https://redis.io/topics/streams-intro.
The type RedisModuleStreamID
, which is used in stream functions, is a struct
with two 64-bit fields and is defined as
typedef struct RedisModuleStreamID {
uint64_t ms;
uint64_t seq;
} RedisModuleStreamID;
See also RedisModule_ValueLength()
, which returns the length of a stream, and the
conversion functions RedisModule_StringToStreamID()
and RedisModule_CreateStringFromStreamID()
.
RedisModule_StreamAdd
int RedisModule_StreamAdd(RedisModuleKey *key,
int flags,
RedisModuleStreamID *id,
RedisModuleString **argv,
long numfields);
Available since: 6.2.0
Adds an entry to a stream. Like XADD without trimming.
key
: The key where the stream is (or will be) storedflags
: A bit field ofREDISMODULE_STREAM_ADD_AUTOID
: Assign a stream ID automatically, like*
in the XADD command.
id
: If theAUTOID
flag is set, this is where the assigned ID is returned. Can be NULL ifAUTOID
is set, if you don’t care to receive the ID. IfAUTOID
is not set, this is the requested ID.argv
: A pointer to an array of sizenumfields * 2
containing the fields and values.numfields
: The number of field-value pairs inargv
.
Returns REDISMODULE_OK
if an entry has been added. On failure,
REDISMODULE_ERR
is returned and errno
is set as follows:
- EINVAL if called with invalid arguments
- ENOTSUP if the key refers to a value of a type other than stream
- EBADF if the key was not opened for writing
- EDOM if the given ID was 0-0 or not greater than all other IDs in the stream (only if the AUTOID flag is unset)
- EFBIG if the stream has reached the last possible ID
- ERANGE if the elements are too large to be stored.
RedisModule_StreamDelete
int RedisModule_StreamDelete(RedisModuleKey *key, RedisModuleStreamID *id);
Available since: 6.2.0
Deletes an entry from a stream.
key
: A key opened for writing, with no stream iterator started.id
: The stream ID of the entry to delete.
Returns REDISMODULE_OK
on success. On failure, REDISMODULE_ERR
is returned
and errno
is set as follows:
- EINVAL if called with invalid arguments
- ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
- EBADF if the key was not opened for writing or if a stream iterator is associated with the key
- ENOENT if no entry with the given stream ID exists
See also RedisModule_StreamIteratorDelete()
for deleting the current entry while
iterating using a stream iterator.
RedisModule_StreamIteratorStart
int RedisModule_StreamIteratorStart(RedisModuleKey *key,
int flags,
RedisModuleStreamID *start,
RedisModuleStreamID *end);
Available since: 6.2.0
Sets up a stream iterator.
key
: The stream key opened for reading usingRedisModule_OpenKey()
.flags
:REDISMODULE_STREAM_ITERATOR_EXCLUSIVE
: Don’t includestart
andend
in the iterated range.REDISMODULE_STREAM_ITERATOR_REVERSE
: Iterate in reverse order, starting from theend
of the range.
start
: The lower bound of the range. Use NULL for the beginning of the stream.end
: The upper bound of the range. Use NULL for the end of the stream.
Returns REDISMODULE_OK
on success. On failure, REDISMODULE_ERR
is returned
and errno
is set as follows:
- EINVAL if called with invalid arguments
- ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
- EBADF if the key was not opened for writing or if a stream iterator is already associated with the key
- EDOM if
start
orend
is outside the valid range
Returns REDISMODULE_OK
on success and REDISMODULE_ERR
if the key doesn’t
refer to a stream or if invalid arguments were given.
The stream IDs are retrieved using RedisModule_StreamIteratorNextID()
and
for each stream ID, the fields and values are retrieved using
RedisModule_StreamIteratorNextField()
. The iterator is freed by calling
RedisModule_StreamIteratorStop()
.
Example (error handling omitted):
RedisModule_StreamIteratorStart(key, 0, startid_ptr, endid_ptr);
RedisModuleStreamID id;
long numfields;
while (RedisModule_StreamIteratorNextID(key, &id, &numfields) ==
REDISMODULE_OK) {
RedisModuleString *field, *value;
while (RedisModule_StreamIteratorNextField(key, &field, &value) ==
REDISMODULE_OK) {
//
// ... Do stuff ...
//
RedisModule_FreeString(ctx, field);
RedisModule_FreeString(ctx, value);
}
}
RedisModule_StreamIteratorStop(key);
RedisModule_StreamIteratorStop
int RedisModule_StreamIteratorStop(RedisModuleKey *key);
Available since: 6.2.0
Stops a stream iterator created using RedisModule_StreamIteratorStart()
and
reclaims its memory.
Returns REDISMODULE_OK
on success. On failure, REDISMODULE_ERR
is returned
and errno
is set as follows:
- EINVAL if called with a NULL key
- ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
- EBADF if the key was not opened for writing or if no stream iterator is associated with the key
RedisModule_StreamIteratorNextID
int RedisModule_StreamIteratorNextID(RedisModuleKey *key,
RedisModuleStreamID *id,
long *numfields);
Available since: 6.2.0
Finds the next stream entry and returns its stream ID and the number of fields.
key
: Key for which a stream iterator has been started usingRedisModule_StreamIteratorStart()
.id
: The stream ID returned. NULL if you don’t care.numfields
: The number of fields in the found stream entry. NULL if you don’t care.
Returns REDISMODULE_OK
and sets *id
and *numfields
if an entry was found.
On failure, REDISMODULE_ERR
is returned and errno
is set as follows:
- EINVAL if called with a NULL key
- ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
- EBADF if no stream iterator is associated with the key
- ENOENT if there are no more entries in the range of the iterator
In practice, if RedisModule_StreamIteratorNextID()
is called after a successful call
to RedisModule_StreamIteratorStart()
and with the same key, it is safe to assume that
an REDISMODULE_ERR
return value means that there are no more entries.
Use RedisModule_StreamIteratorNextField()
to retrieve the fields and values.
See the example at RedisModule_StreamIteratorStart()
.
RedisModule_StreamIteratorNextField
int RedisModule_StreamIteratorNextField(RedisModuleKey *key,
RedisModuleString **field_ptr,
RedisModuleString **value_ptr);
Available since: 6.2.0
Retrieves the next field of the current stream ID and its corresponding value
in a stream iteration. This function should be called repeatedly after calling
RedisModule_StreamIteratorNextID()
to fetch each field-value pair.
key
: Key where a stream iterator has been started.field_ptr
: This is where the field is returned.value_ptr
: This is where the value is returned.
Returns REDISMODULE_OK
and points *field_ptr
and *value_ptr
to freshly
allocated RedisModuleString
objects. The string objects are freed
automatically when the callback finishes if automatic memory is enabled. On
failure, REDISMODULE_ERR
is returned and errno
is set as follows:
- EINVAL if called with a NULL key
- ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
- EBADF if no stream iterator is associated with the key
- ENOENT if there are no more fields in the current stream entry
In practice, if RedisModule_StreamIteratorNextField()
is called after a successful
call to RedisModule_StreamIteratorNextID()
and with the same key, it is safe to assume
that an REDISMODULE_ERR
return value means that there are no more fields.
See the example at RedisModule_StreamIteratorStart()
.
RedisModule_StreamIteratorDelete
int RedisModule_StreamIteratorDelete(RedisModuleKey *key);
Available since: 6.2.0
Deletes the current stream entry while iterating.
This function can be called after RedisModule_StreamIteratorNextID()
or after any
calls to RedisModule_StreamIteratorNextField()
.
Returns REDISMODULE_OK
on success. On failure, REDISMODULE_ERR
is returned
and errno
is set as follows:
- EINVAL if key is NULL
- ENOTSUP if the key is empty or is of another type than stream
- EBADF if the key is not opened for writing, if no iterator has been started
- ENOENT if the iterator has no current stream entry
RedisModule_StreamTrimByLength
long long RedisModule_StreamTrimByLength(RedisModuleKey *key,
int flags,
long long length);
Available since: 6.2.0
Trim a stream by length, similar to XTRIM with MAXLEN.
key
: Key opened for writing.flags
: A bitfield ofREDISMODULE_STREAM_TRIM_APPROX
: Trim less if it improves performance, like XTRIM with~
.
length
: The number of stream entries to keep after trimming.
Returns the number of entries deleted. On failure, a negative value is
returned and errno
is set as follows:
- EINVAL if called with invalid arguments
- ENOTSUP if the key is empty or of a type other than stream
- EBADF if the key is not opened for writing
RedisModule_StreamTrimByID
long long RedisModule_StreamTrimByID(RedisModuleKey *key,
int flags,
RedisModuleStreamID *id);
Available since: 6.2.0
Trim a stream by ID, similar to XTRIM with MINID.
key
: Key opened for writing.flags
: A bitfield ofREDISMODULE_STREAM_TRIM_APPROX
: Trim less if it improves performance, like XTRIM with~
.
id
: The smallest stream ID to keep after trimming.
Returns the number of entries deleted. On failure, a negative value is
returned and errno
is set as follows:
- EINVAL if called with invalid arguments
- ENOTSUP if the key is empty or of a type other than stream
- EBADF if the key is not opened for writing
Calling Redis commands from modules
RedisModule_Call()
sends a command to Redis. The remaining functions handle the reply.
RedisModule_FreeCallReply
void RedisModule_FreeCallReply(RedisModuleCallReply *reply);
Available since: 4.0.0
Free a Call reply and all the nested replies it contains if it’s an array.
RedisModule_CallReplyType
int RedisModule_CallReplyType(RedisModuleCallReply *reply);
Available since: 4.0.0
Return the reply type as one of the following:
REDISMODULE_REPLY_UNKNOWN
REDISMODULE_REPLY_STRING
REDISMODULE_REPLY_ERROR
REDISMODULE_REPLY_INTEGER
REDISMODULE_REPLY_ARRAY
REDISMODULE_REPLY_NULL
REDISMODULE_REPLY_MAP
REDISMODULE_REPLY_SET
REDISMODULE_REPLY_BOOL
REDISMODULE_REPLY_DOUBLE
REDISMODULE_REPLY_BIG_NUMBER
REDISMODULE_REPLY_VERBATIM_STRING
REDISMODULE_REPLY_ATTRIBUTE
RedisModule_CallReplyLength
size_t RedisModule_CallReplyLength(RedisModuleCallReply *reply);
Available since: 4.0.0
Return the reply type length, where applicable.
RedisModule_CallReplyArrayElement
RedisModuleCallReply *RedisModule_CallReplyArrayElement(RedisModuleCallReply *reply,
size_t idx);
Available since: 4.0.0
Return the ‘idx’-th nested call reply element of an array reply, or NULL if the reply type is wrong or the index is out of range.
RedisModule_CallReplyInteger
long long RedisModule_CallReplyInteger(RedisModuleCallReply *reply);
Available since: 4.0.0
Return the long long of an integer reply.
RedisModule_CallReplyDouble
double RedisModule_CallReplyDouble(RedisModuleCallReply *reply);
Return the double value of a double reply.
RedisModule_CallReplyBigNumber
const char *RedisModule_CallReplyBigNumber(RedisModuleCallReply *reply,
size_t *len);
Return the big number value of a big number reply.
RedisModule_CallReplyVerbatim
const char *RedisModule_CallReplyVerbatim(RedisModuleCallReply *reply,
size_t *len,
const char **format);
Return the value of an verbatim string reply, An optional output argument can be given to get verbatim reply format.
RedisModule_CallReplyBool
int RedisModule_CallReplyBool(RedisModuleCallReply *reply);
Return the Boolean value of a Boolean reply.
RedisModule_CallReplySetElement
RedisModuleCallReply *RedisModule_CallReplySetElement(RedisModuleCallReply *reply,
size_t idx);
Return the ‘idx’-th nested call reply element of a set reply, or NULL if the reply type is wrong or the index is out of range.
RedisModule_CallReplyMapElement
int RedisModule_CallReplyMapElement(RedisModuleCallReply *reply,
size_t idx,
RedisModuleCallReply **key,
RedisModuleCallReply **val);
Retrieve the ‘idx’-th key and value of a map reply.
Returns:
REDISMODULE_OK
on success.REDISMODULE_ERR
if idx out of range or if the reply type is wrong.
The key
and value
arguments are used to return by reference, and may be
NULL if not required.
RedisModule_CallReplyAttribute
RedisModuleCallReply *RedisModule_CallReplyAttribute(RedisModuleCallReply *reply);
Return the attribute of the given reply, or NULL if no attribute exists.
RedisModule_CallReplyAttributeElement
int RedisModule_CallReplyAttributeElement(RedisModuleCallReply *reply,
size_t idx,
RedisModuleCallReply **key,
RedisModuleCallReply **val);
Retrieve the ‘idx’-th key and value of a attribute reply.
Returns:
REDISMODULE_OK
on success.REDISMODULE_ERR
if idx out of range or if the reply type is wrong.
The key
and value
arguments are used to return by reference, and may be
NULL if not required.
RedisModule_CallReplyStringPtr
const char *RedisModule_CallReplyStringPtr(RedisModuleCallReply *reply,
size_t *len);
Available since: 4.0.0
Return the pointer and length of a string or error reply.
RedisModule_CreateStringFromCallReply
RedisModuleString *RedisModule_CreateStringFromCallReply(RedisModuleCallReply *reply);
Available since: 4.0.0
Return a new string object from a call reply of type string, error or integer. Otherwise (wrong reply type) return NULL.
RedisModule_Call
RedisModuleCallReply *RedisModule_Call(RedisModuleCtx *ctx,
const char *cmdname,
const char *fmt,
...);
Available since: 4.0.0
Exported API to call any Redis command from modules.
-
cmdname: The Redis command to call.
-
fmt: A format specifier string for the command’s arguments. Each of the arguments should be specified by a valid type specification. The format specifier can also contain the modifiers
!
,A
,3
andR
which don’t have a corresponding argument.b
– The argument is a buffer and is immediately followed by another argument that is the buffer’s length.c
– The argument is a pointer to a plain C string (null-terminated).l
– The argument is long long integer.s
– The argument is a RedisModuleString.v
– The argument(s) is a vector of RedisModuleString.!
– Sends the Redis command and its arguments to replicas and AOF.A
– Suppress AOF propagation, send only to replicas (requires!
).R
– Suppress replicas propagation, send only to AOF (requires!
).3
– Return a RESP3 reply. This will change the command reply. e.g., HGETALL returns a map instead of a flat array.0
– Return the reply in auto mode, i.e. the reply format will be the same as the client attached to the given RedisModuleCtx. This will probably used when you want to pass the reply directly to the client.C
– Check if command can be executed according to ACL rules.
-
…: The actual arguments to the Redis command.
On success a RedisModuleCallReply
object is returned, otherwise
NULL is returned and errno is set to the following values:
- EBADF: wrong format specifier.
- EINVAL: wrong command arity.
- ENOENT: command does not exist.
- EPERM: operation in Cluster instance with key in non local slot.
- EROFS: operation in Cluster instance when a write command is sent in a readonly state.
- ENETDOWN: operation in Cluster instance when cluster is down.
- ENOTSUP: No ACL user for the specified module context
- EACCES: Command cannot be executed, according to ACL rules
Example code fragment:
reply = RedisModule_Call(ctx,"INCRBY","sc",argv[1],"10");
if (RedisModule_CallReplyType(reply) == REDISMODULE_REPLY_INTEGER) {
long long myval = RedisModule_CallReplyInteger(reply);
// Do something with myval.
}
This API is documented here: https://redis.io/topics/modules-intro
RedisModule_CallReplyProto
const char *RedisModule_CallReplyProto(RedisModuleCallReply *reply,
size_t *len);
Available since: 4.0.0
Return a pointer, and a length, to the protocol returned by the command that returned the reply object.
Modules data types
When String DMA or using existing data structures is not enough, it is possible to create new data types from scratch and export them to Redis. The module must provide a set of callbacks for handling the new values exported (for example in order to provide RDB saving/loading, AOF rewrite, and so forth). In this section we define this API.
RedisModule_CreateDataType
moduleType *RedisModule_CreateDataType(RedisModuleCtx *ctx,
const char *name,
int encver,
void *typemethods_ptr);
Available since: 4.0.0
Register a new data type exported by the module. The parameters are the following. Please for in depth documentation check the modules API documentation, especially https://redis.io/topics/modules-native-types.
-
name: A 9 characters data type name that MUST be unique in the Redis Modules ecosystem. Be creative… and there will be no collisions. Use the charset A-Z a-z 9-0, plus the two “-_” characters. A good idea is to use, for example
<typename>-<vendor>
. For example “tree-AntZ” may mean “Tree data structure by @antirez”. To use both lower case and upper case letters helps in order to prevent collisions. -
encver: Encoding version, which is, the version of the serialization that a module used in order to persist data. As long as the “name” matches, the RDB loading will be dispatched to the type callbacks whatever ‘encver’ is used, however the module can understand if the encoding it must load are of an older version of the module. For example the module “tree-AntZ” initially used encver=0. Later after an upgrade, it started to serialize data in a different format and to register the type with encver=1. However this module may still load old data produced by an older version if the
rdb_load
callback is able to check the encver value and act accordingly. The encver must be a positive value between 0 and 1023. -
typemethods_ptr is a pointer to a
RedisModuleTypeMethods
structure that should be populated with the methods callbacks and structure version, like in the following example:RedisModuleTypeMethods tm = { .version = REDISMODULE_TYPE_METHOD_VERSION, .rdb_load = myType_RDBLoadCallBack, .rdb_save = myType_RDBSaveCallBack, .aof_rewrite = myType_AOFRewriteCallBack, .free = myType_FreeCallBack, // Optional fields .digest = myType_DigestCallBack, .mem_usage = myType_MemUsageCallBack, .aux_load = myType_AuxRDBLoadCallBack, .aux_save = myType_AuxRDBSaveCallBack, .free_effort = myType_FreeEffortCallBack, .unlink = myType_UnlinkCallBack, .copy = myType_CopyCallback, .defrag = myType_DefragCallback // Enhanced optional fields .mem_usage2 = myType_MemUsageCallBack2, .free_effort2 = myType_FreeEffortCallBack2, .unlink2 = myType_UnlinkCallBack2, .copy2 = myType_CopyCallback2, }
-
rdb_load: A callback function pointer that loads data from RDB files.
-
rdb_save: A callback function pointer that saves data to RDB files.
-
aof_rewrite: A callback function pointer that rewrites data as commands.
-
digest: A callback function pointer that is used for
DEBUG DIGEST
. -
free: A callback function pointer that can free a type value.
-
aux_save: A callback function pointer that saves out of keyspace data to RDB files. ‘when’ argument is either
REDISMODULE_AUX_BEFORE_RDB
orREDISMODULE_AUX_AFTER_RDB
. -
aux_load: A callback function pointer that loads out of keyspace data from RDB files. Similar to
aux_save
, returnsREDISMODULE_OK
on success, and ERR otherwise. -
free_effort: A callback function pointer that used to determine whether the module’s memory needs to be lazy reclaimed. The module should return the complexity involved by freeing the value. for example: how many pointers are gonna be freed. Note that if it returns 0, we’ll always do an async free.
-
unlink: A callback function pointer that used to notifies the module that the key has been removed from the DB by redis, and may soon be freed by a background thread. Note that it won’t be called on FLUSHALL/FLUSHDB (both sync and async), and the module can use the
RedisModuleEvent_FlushDB
to hook into that. -
copy: A callback function pointer that is used to make a copy of the specified key. The module is expected to perform a deep copy of the specified value and return it. In addition, hints about the names of the source and destination keys is provided. A NULL return value is considered an error and the copy operation fails. Note: if the target key exists and is being overwritten, the copy callback will be called first, followed by a free callback to the value that is being replaced.
-
defrag: A callback function pointer that is used to request the module to defrag a key. The module should then iterate pointers and call the relevant
RedisModule_Defrag*()
functions to defragment pointers or complex types. The module should continue iterating as long asRedisModule_DefragShouldStop()
returns a zero value, and return a zero value if finished or non-zero value if more work is left to be done. If more work needs to be done,RedisModule_DefragCursorSet()
andRedisModule_DefragCursorGet()
can be used to track this work across different calls. Normally, the defrag mechanism invokes the callback without a time limit, soRedisModule_DefragShouldStop()
always returns zero. The “late defrag” mechanism which has a time limit and provides cursor support is used only for keys that are determined to have significant internal complexity. To determine this, the defrag mechanism uses thefree_effort
callback and the ‘active-defrag-max-scan-fields’ config directive. NOTE: The value is passed as avoid**
and the function is expected to update the pointer if the top-level value pointer is defragmented and consequently changes. -
mem_usage2: Similar to
mem_usage
, but provides theRedisModuleKeyOptCtx
parameter so that meta information such as key name and db id can be obtained, and thesample_size
for size estimation (see MEMORY USAGE command). -
free_effort2: Similar to
free_effort
, but provides theRedisModuleKeyOptCtx
parameter so that meta information such as key name and db id can be obtained. -
unlink2: Similar to
unlink
, but provides theRedisModuleKeyOptCtx
parameter so that meta information such as key name and db id can be obtained. -
copy2: Similar to
copy
, but provides theRedisModuleKeyOptCtx
parameter so that meta information such as key names and db ids can be obtained.
Note: the module name “AAAAAAAAA” is reserved and produces an error, it happens to be pretty lame as well.
If there is already a module registering a type with the same name,
and if the module name or encver is invalid, NULL is returned.
Otherwise the new type is registered into Redis, and a reference of
type RedisModuleType
is returned: the caller of the function should store
this reference into a global variable to make future use of it in the
modules type API, since a single module may register multiple types.
Example code fragment:
static RedisModuleType *BalancedTreeType;
int RedisModule_OnLoad(RedisModuleCtx *ctx) {
// some code here ...
BalancedTreeType = RM_CreateDataType(...);
}
RedisModule_ModuleTypeSetValue
int RedisModule_ModuleTypeSetValue(RedisModuleKey *key,
moduleType *mt,
void *value);
Available since: 4.0.0
If the key is open for writing, set the specified module type object
as the value of the key, deleting the old value if any.
On success REDISMODULE_OK
is returned. If the key is not open for
writing or there is an active iterator, REDISMODULE_ERR
is returned.
RedisModule_ModuleTypeGetType
moduleType *RedisModule_ModuleTypeGetType(RedisModuleKey *key);
Available since: 4.0.0
Assuming RedisModule_KeyType()
returned REDISMODULE_KEYTYPE_MODULE
on
the key, returns the module type pointer of the value stored at key.
If the key is NULL, is not associated with a module type, or is empty, then NULL is returned instead.
RedisModule_ModuleTypeGetValue
void *RedisModule_ModuleTypeGetValue(RedisModuleKey *key);
Available since: 4.0.0
Assuming RedisModule_KeyType()
returned REDISMODULE_KEYTYPE_MODULE
on
the key, returns the module type low-level value stored at key, as
it was set by the user via RedisModule_ModuleTypeSetValue()
.
If the key is NULL, is not associated with a module type, or is empty, then NULL is returned instead.
RDB loading and saving functions
RedisModule_IsIOError
int RedisModule_IsIOError(RedisModuleIO *io);
Available since: 6.0.0
Returns true if any previous IO API failed.
for Load*
APIs the REDISMODULE_OPTIONS_HANDLE_IO_ERRORS
flag must be set with
RedisModule_SetModuleOptions
first.
RedisModule_SaveUnsigned
void RedisModule_SaveUnsigned(RedisModuleIO *io, uint64_t value);
Available since: 4.0.0
Save an unsigned 64 bit value into the RDB file. This function should only
be called in the context of the rdb_save
method of modules implementing new
data types.
RedisModule_LoadUnsigned
uint64_t RedisModule_LoadUnsigned(RedisModuleIO *io);
Available since: 4.0.0
Load an unsigned 64 bit value from the RDB file. This function should only
be called in the context of the rdb_load
method of modules implementing
new data types.
RedisModule_SaveSigned
void RedisModule_SaveSigned(RedisModuleIO *io, int64_t value);
Available since: 4.0.0
Like RedisModule_SaveUnsigned()
but for signed 64 bit values.
RedisModule_LoadSigned
int64_t RedisModule_LoadSigned(RedisModuleIO *io);
Available since: 4.0.0
Like RedisModule_LoadUnsigned()
but for signed 64 bit values.
RedisModule_SaveString
void RedisModule_SaveString(RedisModuleIO *io, RedisModuleString *s);
Available since: 4.0.0
In the context of the rdb_save
method of a module type, saves a
string into the RDB file taking as input a RedisModuleString
.
The string can be later loaded with RedisModule_LoadString()
or
other Load family functions expecting a serialized string inside
the RDB file.
RedisModule_SaveStringBuffer
void RedisModule_SaveStringBuffer(RedisModuleIO *io,
const char *str,
size_t len);
Available since: 4.0.0
Like RedisModule_SaveString()
but takes a raw C pointer and length
as input.
RedisModule_LoadString
RedisModuleString *RedisModule_LoadString(RedisModuleIO *io);
Available since: 4.0.0
In the context of the rdb_load
method of a module data type, loads a string
from the RDB file, that was previously saved with RedisModule_SaveString()
functions family.
The returned string is a newly allocated RedisModuleString
object, and
the user should at some point free it with a call to RedisModule_FreeString()
.
If the data structure does not store strings as RedisModuleString
objects,
the similar function RedisModule_LoadStringBuffer()
could be used instead.
RedisModule_LoadStringBuffer
char *RedisModule_LoadStringBuffer(RedisModuleIO *io, size_t *lenptr);
Available since: 4.0.0
Like RedisModule_LoadString()
but returns an heap allocated string that
was allocated with RedisModule_Alloc()
, and can be resized or freed with
RedisModule_Realloc()
or RedisModule_Free()
.
The size of the string is stored at ‘*lenptr’ if not NULL. The returned string is not automatically NULL terminated, it is loaded exactly as it was stored inside the RDB file.
RedisModule_SaveDouble
void RedisModule_SaveDouble(RedisModuleIO *io, double value);
Available since: 4.0.0
In the context of the rdb_save
method of a module data type, saves a double
value to the RDB file. The double can be a valid number, a NaN or infinity.
It is possible to load back the value with RedisModule_LoadDouble()
.
RedisModule_LoadDouble
double RedisModule_LoadDouble(RedisModuleIO *io);
Available since: 4.0.0
In the context of the rdb_save
method of a module data type, loads back the
double value saved by RedisModule_SaveDouble()
.
RedisModule_SaveFloat
void RedisModule_SaveFloat(RedisModuleIO *io, float value);
Available since: 4.0.0
In the context of the rdb_save
method of a module data type, saves a float
value to the RDB file. The float can be a valid number, a NaN or infinity.
It is possible to load back the value with RedisModule_LoadFloat()
.
RedisModule_LoadFloat
float RedisModule_LoadFloat(RedisModuleIO *io);
Available since: 4.0.0
In the context of the rdb_save
method of a module data type, loads back the
float value saved by RedisModule_SaveFloat()
.
RedisModule_SaveLongDouble
void RedisModule_SaveLongDouble(RedisModuleIO *io, long double value);
Available since: 6.0.0
In the context of the rdb_save
method of a module data type, saves a long double
value to the RDB file. The double can be a valid number, a NaN or infinity.
It is possible to load back the value with RedisModule_LoadLongDouble()
.
RedisModule_LoadLongDouble
long double RedisModule_LoadLongDouble(RedisModuleIO *io);
Available since: 6.0.0
In the context of the rdb_save
method of a module data type, loads back the
long double value saved by RedisModule_SaveLongDouble()
.
Key digest API (DEBUG DIGEST interface for modules types)
RedisModule_DigestAddStringBuffer
void RedisModule_DigestAddStringBuffer(RedisModuleDigest *md,
const char *ele,
size_t len);
Available since: 4.0.0
Add a new element to the digest. This function can be called multiple times
one element after the other, for all the elements that constitute a given
data structure. The function call must be followed by the call to
RedisModule_DigestEndSequence
eventually, when all the elements that are
always in a given order are added. See the Redis Modules data types
documentation for more info. However this is a quick example that uses Redis
data types as an example.
To add a sequence of unordered elements (for example in the case of a Redis Set), the pattern to use is:
foreach element {
AddElement(element);
EndSequence();
}
Because Sets are not ordered, so every element added has a position that does not depend from the other. However if instead our elements are ordered in pairs, like field-value pairs of an Hash, then one should use:
foreach key,value {
AddElement(key);
AddElement(value);
EndSequence();
}
Because the key and value will be always in the above order, while instead the single key-value pairs, can appear in any position into a Redis hash.
A list of ordered elements would be implemented with:
foreach element {
AddElement(element);
}
EndSequence();
RedisModule_DigestAddLongLong
void RedisModule_DigestAddLongLong(RedisModuleDigest *md, long long ll);
Available since: 4.0.0
Like RedisModule_DigestAddStringBuffer()
but takes a long long as input
that gets converted into a string before adding it to the digest.
RedisModule_DigestEndSequence
void RedisModule_DigestEndSequence(RedisModuleDigest *md);
Available since: 4.0.0
See the documentation for RedisModule_DigestAddElement()
.
RedisModule_LoadDataTypeFromStringEncver
void *RedisModule_LoadDataTypeFromStringEncver(const RedisModuleString *str,
const moduleType *mt,
int encver);
Decode a serialized representation of a module data type ‘mt’, in a specific encoding version ‘encver’ from string ‘str’ and return a newly allocated value, or NULL if decoding failed.
This call basically reuses the ‘rdb_load
’ callback which module data types
implement in order to allow a module to arbitrarily serialize/de-serialize
keys, similar to how the Redis ‘DUMP’ and ‘RESTORE’ commands are implemented.
Modules should generally use the REDISMODULE_OPTIONS_HANDLE_IO_ERRORS
flag and
make sure the de-serialization code properly checks and handles IO errors
(freeing allocated buffers and returning a NULL).
If this is NOT done, Redis will handle corrupted (or just truncated) serialized data by producing an error message and terminating the process.
RedisModule_LoadDataTypeFromString
void *RedisModule_LoadDataTypeFromString(const RedisModuleString *str,
const moduleType *mt);
Available since: 6.0.0
Similar to RedisModule_LoadDataTypeFromStringEncver
, original version of the API, kept
for backward compatibility.
RedisModule_SaveDataTypeToString
RedisModuleString *RedisModule_SaveDataTypeToString(RedisModuleCtx *ctx,
void *data,
const moduleType *mt);
Available since: 6.0.0
Encode a module data type ‘mt’ value ‘data’ into serialized form, and return it
as a newly allocated RedisModuleString
.
This call basically reuses the ‘rdb_save
’ callback which module data types
implement in order to allow a module to arbitrarily serialize/de-serialize
keys, similar to how the Redis ‘DUMP’ and ‘RESTORE’ commands are implemented.
RedisModule_GetKeyNameFromDigest
const RedisModuleString *RedisModule_GetKeyNameFromDigest(RedisModuleDigest *dig);
Returns the name of the key currently being processed.
RedisModule_GetDbIdFromDigest
int RedisModule_GetDbIdFromDigest(RedisModuleDigest *dig);
Returns the database id of the key currently being processed.
AOF API for modules data types
RedisModule_EmitAOF
void RedisModule_EmitAOF(RedisModuleIO *io,
const char *cmdname,
const char *fmt,
...);
Available since: 4.0.0
Emits a command into the AOF during the AOF rewriting process. This function
is only called in the context of the aof_rewrite
method of data types exported
by a module. The command works exactly like RedisModule_Call()
in the way
the parameters are passed, but it does not return anything as the error
handling is performed by Redis itself.
IO context handling
RedisModule_GetKeyNameFromIO
const RedisModuleString *RedisModule_GetKeyNameFromIO(RedisModuleIO *io);
Available since: 5.0.5
Returns the name of the key currently being processed. There is no guarantee that the key name is always available, so this may return NULL.
RedisModule_GetKeyNameFromModuleKey
const RedisModuleString *RedisModule_GetKeyNameFromModuleKey(RedisModuleKey *key);
Available since: 6.0.0
Returns a RedisModuleString
with the name of the key from RedisModuleKey
.
RedisModule_GetDbIdFromModuleKey
int RedisModule_GetDbIdFromModuleKey(RedisModuleKey *key);
Returns a database id of the key from RedisModuleKey
.
RedisModule_GetDbIdFromIO
int RedisModule_GetDbIdFromIO(RedisModuleIO *io);
Returns the database id of the key currently being processed. There is no guarantee that this info is always available, so this may return -1.
Logging
RedisModule_Log
void RedisModule_Log(RedisModuleCtx *ctx,
const char *levelstr,
const char *fmt,
...);
Available since: 4.0.0
Produces a log message to the standard Redis log, the format accepts printf-alike specifiers, while level is a string describing the log level to use when emitting the log, and must be one of the following:
- “debug” (
REDISMODULE_LOGLEVEL_DEBUG
) - “verbose” (
REDISMODULE_LOGLEVEL_VERBOSE
) - “notice” (
REDISMODULE_LOGLEVEL_NOTICE
) - “warning” (
REDISMODULE_LOGLEVEL_WARNING
)
If the specified log level is invalid, verbose is used by default. There is a fixed limit to the length of the log line this function is able to emit, this limit is not specified but is guaranteed to be more than a few lines of text.
The ctx argument may be NULL if cannot be provided in the context of the caller for instance threads or callbacks, in which case a generic “module” will be used instead of the module name.
RedisModule_LogIOError
void RedisModule_LogIOError(RedisModuleIO *io,
const char *levelstr,
const char *fmt,
...);
Available since: 4.0.0
Log errors from RDB / AOF serialization callbacks.
This function should be used when a callback is returning a critical error to the caller since cannot load or save the data for some critical reason.
RedisModule__Assert
void RedisModule__Assert(const char *estr, const char *file, int line);
Available since: 6.0.0
Redis-like assert function.
The macro RedisModule_Assert(expression)
is recommended, rather than
calling this function directly.
A failed assertion will shut down the server and produce logging information that looks identical to information generated by Redis itself.
RedisModule_LatencyAddSample
void RedisModule_LatencyAddSample(const char *event, mstime_t latency);
Available since: 6.0.0
Allows adding event to the latency monitor to be observed by the LATENCY command. The call is skipped if the latency is smaller than the configured latency-monitor-threshold.
Blocking clients from modules
For a guide about blocking commands in modules, see https://redis.io/topics/modules-blocking-ops.
RedisModule_BlockClient
RedisModuleBlockedClient *RedisModule_BlockClient(RedisModuleCtx *ctx,
RedisModuleCmdFunc reply_callback,
RedisModuleCmdFunc timeout_callback,
void (*free_privdata)(RedisModuleCtx*, void*),
long long timeout_ms);
Available since: 4.0.0
Block a client in the context of a blocking command, returning an handle
which will be used, later, in order to unblock the client with a call to
RedisModule_UnblockClient()
. The arguments specify callback functions
and a timeout after which the client is unblocked.
The callbacks are called in the following contexts:
reply_callback: called after a successful RedisModule_UnblockClient()
call in order to reply to the client and unblock it.
timeout_callback: called when the timeout is reached or if `CLIENT UNBLOCK`
is invoked, in order to send an error to the client.
free_privdata: called in order to free the private data that is passed
by RedisModule_UnblockClient() call.
Note: RedisModule_UnblockClient
should be called for every blocked client,
even if client was killed, timed-out or disconnected. Failing to do so
will result in memory leaks.
There are some cases where RedisModule_BlockClient()
cannot be used:
- If the client is a Lua script.
- If the client is executing a MULTI block.
In these cases, a call to RedisModule_BlockClient()
will not block the
client, but instead produce a specific error reply.
A module that registers a timeout_callback
function can also be unblocked
using the CLIENT UNBLOCK
command, which will trigger the timeout callback.
If a callback function is not registered, then the blocked client will be
treated as if it is not in a blocked state and CLIENT UNBLOCK
will return
a zero value.
Measuring background time: By default the time spent in the blocked command
is not account for the total command duration. To include such time you should
use RedisModule_BlockedClientMeasureTimeStart()
and RedisModule_BlockedClientMeasureTimeEnd()
one,
or multiple times within the blocking command background work.
RedisModule_BlockClientOnKeys
RedisModuleBlockedClient *RedisModule_BlockClientOnKeys(RedisModuleCtx *ctx,
RedisModuleCmdFunc reply_callback,
RedisModuleCmdFunc timeout_callback,
void (*free_privdata)(RedisModuleCtx*, void*),
long long timeout_ms,
RedisModuleString **keys,
int numkeys,
void *privdata);
Available since: 6.0.0
This call is similar to RedisModule_BlockClient()
, however in this case we
don’t just block the client, but also ask Redis to unblock it automatically
once certain keys become “ready”, that is, contain more data.
Basically this is similar to what a typical Redis command usually does, like BLPOP or BZPOPMAX: the client blocks if it cannot be served ASAP, and later when the key receives new data (a list push for instance), the client is unblocked and served.
However in the case of this module API, when the client is unblocked?
- If you block on a key of a type that has blocking operations associated, like a list, a sorted set, a stream, and so forth, the client may be unblocked once the relevant key is targeted by an operation that normally unblocks the native blocking operations for that type. So if we block on a list key, an RPUSH command may unblock our client and so forth.
- If you are implementing your native data type, or if you want to add new
unblocking conditions in addition to “1”, you can call the modules API
RedisModule_SignalKeyAsReady()
.
Anyway we can’t be sure if the client should be unblocked just because the
key is signaled as ready: for instance a successive operation may change the
key, or a client in queue before this one can be served, modifying the key
as well and making it empty again. So when a client is blocked with
RedisModule_BlockClientOnKeys()
the reply callback is not called after
RedisModule_UnblockClient()
is called, but every time a key is signaled as ready:
if the reply callback can serve the client, it returns REDISMODULE_OK
and the client is unblocked, otherwise it will return REDISMODULE_ERR
and we’ll try again later.
The reply callback can access the key that was signaled as ready by
calling the API RedisModule_GetBlockedClientReadyKey()
, that returns
just the string name of the key as a RedisModuleString
object.
Thanks to this system we can setup complex blocking scenarios, like unblocking a client only if a list contains at least 5 items or other more fancy logics.
Note that another difference with RedisModule_BlockClient()
, is that here
we pass the private data directly when blocking the client: it will
be accessible later in the reply callback. Normally when blocking with
RedisModule_BlockClient()
the private data to reply to the client is
passed when calling RedisModule_UnblockClient()
but here the unblocking
is performed by Redis itself, so we need to have some private data before
hand. The private data is used to store any information about the specific
unblocking operation that you are implementing. Such information will be
freed using the free_privdata
callback provided by the user.
However the reply callback will be able to access the argument vector of the command, so the private data is often not needed.
Note: Under normal circumstances RedisModule_UnblockClient
should not be
called for clients that are blocked on keys (Either the key will
become ready or a timeout will occur). If for some reason you do want
to call RedisModule_UnblockClient it is possible: Client will be
handled as if it were timed-out (You must implement the timeout
callback in that case).
RedisModule_SignalKeyAsReady
void RedisModule_SignalKeyAsReady(RedisModuleCtx *ctx, RedisModuleString *key);
Available since: 6.0.0
This function is used in order to potentially unblock a client blocked
on keys with RedisModule_BlockClientOnKeys()
. When this function is called,
all the clients blocked for this key will get their reply_callback
called.
Note: The function has no effect if the signaled key doesn’t exist.
RedisModule_UnblockClient
int RedisModule_UnblockClient(RedisModuleBlockedClient *bc, void *privdata);
Available since: 4.0.0
Unblock a client blocked by RedisModule_BlockedClient
. This will trigger
the reply callbacks to be called in order to reply to the client.
The ‘privdata’ argument will be accessible by the reply callback, so
the caller of this function can pass any value that is needed in order to
actually reply to the client.
A common usage for ‘privdata’ is a thread that computes something that needs to be passed to the client, included but not limited some slow to compute reply or some reply obtained via networking.
Note 1: this function can be called from threads spawned by the module.
Note 2: when we unblock a client that is blocked for keys using the API
RedisModule_BlockClientOnKeys()
, the privdata argument here is not used.
Unblocking a client that was blocked for keys using this API will still
require the client to get some reply, so the function will use the
“timeout” handler in order to do so (The privdata provided in
RedisModule_BlockClientOnKeys()
is accessible from the timeout
callback via RedisModule_GetBlockedClientPrivateData
).
RedisModule_AbortBlock
int RedisModule_AbortBlock(RedisModuleBlockedClient *bc);
Available since: 4.0.0
Abort a blocked client blocking operation: the client will be unblocked without firing any callback.
RedisModule_SetDisconnectCallback
void RedisModule_SetDisconnectCallback(RedisModuleBlockedClient *bc,
RedisModuleDisconnectFunc callback);
Available since: 5.0.0
Set a callback that will be called if a blocked client disconnects
before the module has a chance to call RedisModule_UnblockClient()
Usually what you want to do there, is to cleanup your module state
so that you can call RedisModule_UnblockClient()
safely, otherwise
the client will remain blocked forever if the timeout is large.
Notes:
-
It is not safe to call Reply* family functions here, it is also useless since the client is gone.
-
This callback is not called if the client disconnects because of a timeout. In such a case, the client is unblocked automatically and the timeout callback is called.
RedisModule_IsBlockedReplyRequest
int RedisModule_IsBlockedReplyRequest(RedisModuleCtx *ctx);
Available since: 4.0.0
Return non-zero if a module command was called in order to fill the reply for a blocked client.
RedisModule_IsBlockedTimeoutRequest
int RedisModule_IsBlockedTimeoutRequest(RedisModuleCtx *ctx);
Available since: 4.0.0
Return non-zero if a module command was called in order to fill the reply for a blocked client that timed out.
RedisModule_GetBlockedClientPrivateData
void *RedisModule_GetBlockedClientPrivateData(RedisModuleCtx *ctx);
Available since: 4.0.0
Get the private data set by RedisModule_UnblockClient()
RedisModule_GetBlockedClientReadyKey
RedisModuleString *RedisModule_GetBlockedClientReadyKey(RedisModuleCtx *ctx);
Available since: 6.0.0
Get the key that is ready when the reply callback is called in the context
of a client blocked by RedisModule_BlockClientOnKeys()
.
RedisModule_GetBlockedClientHandle
RedisModuleBlockedClient *RedisModule_GetBlockedClientHandle(RedisModuleCtx *ctx);
Available since: 5.0.0
Get the blocked client associated with a given context. This is useful in the reply and timeout callbacks of blocked clients, before sometimes the module has the blocked client handle references around, and wants to cleanup it.
RedisModule_BlockedClientDisconnected
int RedisModule_BlockedClientDisconnected(RedisModuleCtx *ctx);
Available since: 5.0.0
Return true if when the free callback of a blocked client is called, the reason for the client to be unblocked is that it disconnected while it was blocked.
Thread Safe Contexts
RedisModule_GetThreadSafeContext
RedisModuleCtx *RedisModule_GetThreadSafeContext(RedisModuleBlockedClient *bc);
Available since: 4.0.0
Return a context which can be used inside threads to make Redis context
calls with certain modules APIs. If ‘bc’ is not NULL then the module will
be bound to a blocked client, and it will be possible to use the
RedisModule_Reply*
family of functions to accumulate a reply for when the
client will be unblocked. Otherwise the thread safe context will be
detached by a specific client.
To call non-reply APIs, the thread safe context must be prepared with:
RedisModule_ThreadSafeContextLock(ctx);
... make your call here ...
RedisModule_ThreadSafeContextUnlock(ctx);
This is not needed when using RedisModule_Reply*
functions, assuming
that a blocked client was used when the context was created, otherwise
no RedisModule_Reply
* call should be made at all.
NOTE: If you’re creating a detached thread safe context (bc is NULL),
consider using RM_GetDetachedThreadSafeContext
which will also retain
the module ID and thus be more useful for logging.
RedisModule_GetDetachedThreadSafeContext
RedisModuleCtx *RedisModule_GetDetachedThreadSafeContext(RedisModuleCtx *ctx);
Available since: 6.0.9
Return a detached thread safe context that is not associated with any specific blocked client, but is associated with the module’s context.
This is useful for modules that wish to hold a global context over a long term, for purposes such as logging.
RedisModule_FreeThreadSafeContext
void RedisModule_FreeThreadSafeContext(RedisModuleCtx *ctx);
Available since: 4.0.0
Release a thread safe context.
RedisModule_ThreadSafeContextLock
void RedisModule_ThreadSafeContextLock(RedisModuleCtx *ctx);
Available since: 4.0.0
Acquire the server lock before executing a thread safe API call.
This is not needed for RedisModule_Reply*
calls when there is
a blocked client connected to the thread safe context.
RedisModule_ThreadSafeContextTryLock
int RedisModule_ThreadSafeContextTryLock(RedisModuleCtx *ctx);
Available since: 6.0.8
Similar to RedisModule_ThreadSafeContextLock
but this function
would not block if the server lock is already acquired.
If successful (lock acquired) REDISMODULE_OK
is returned,
otherwise REDISMODULE_ERR
is returned and errno is set
accordingly.
RedisModule_ThreadSafeContextUnlock
void RedisModule_ThreadSafeContextUnlock(RedisModuleCtx *ctx);
Available since: 4.0.0
Release the server lock after a thread safe API call was executed.
Module Keyspace Notifications API
RedisModule_SubscribeToKeyspaceEvents
int RedisModule_SubscribeToKeyspaceEvents(RedisModuleCtx *ctx,
int types,
RedisModuleNotificationFunc callback);
Available since: 4.0.9
Subscribe to keyspace notifications. This is a low-level version of the keyspace-notifications API. A module can register callbacks to be notified when keyspace events occur.
Notification events are filtered by their type (string events, set events, etc), and the subscriber callback receives only events that match a specific mask of event types.
When subscribing to notifications with RedisModule_SubscribeToKeyspaceEvents
the module must provide an event type-mask, denoting the events the subscriber
is interested in. This can be an ORed mask of any of the following flags:
REDISMODULE_NOTIFY_GENERIC
: Generic commands like DEL, EXPIRE, RENAMEREDISMODULE_NOTIFY_STRING
: String eventsREDISMODULE_NOTIFY_LIST
: List eventsREDISMODULE_NOTIFY_SET
: Set eventsREDISMODULE_NOTIFY_HASH
: Hash eventsREDISMODULE_NOTIFY_ZSET
: Sorted Set eventsREDISMODULE_NOTIFY_EXPIRED
: Expiration eventsREDISMODULE_NOTIFY_EVICTED
: Eviction eventsREDISMODULE_NOTIFY_STREAM
: Stream eventsREDISMODULE_NOTIFY_MODULE
: Module types eventsREDISMODULE_NOTIFY_KEYMISS
: Key-miss eventsREDISMODULE_NOTIFY_ALL
: All events (ExcludingREDISMODULE_NOTIFY_KEYMISS
)REDISMODULE_NOTIFY_LOADED
: A special notification available only for modules, indicates that the key was loaded from persistence. Notice, when this event fires, the given key can not be retained, use RM_CreateStringFromString instead.
We do not distinguish between key events and keyspace events, and it is up to the module to filter the actions taken based on the key.
The subscriber signature is:
int (*RedisModuleNotificationFunc) (RedisModuleCtx *ctx, int type,
const char *event,
RedisModuleString *key);
type
is the event type bit, that must match the mask given at registration
time. The event string is the actual command being executed, and key is the
relevant Redis key.
Notification callback gets executed with a redis context that can not be used to send anything to the client, and has the db number where the event occurred as its selected db number.
Notice that it is not necessary to enable notifications in redis.conf for module notifications to work.
Warning: the notification callbacks are performed in a synchronous manner, so notification callbacks must to be fast, or they would slow Redis down. If you need to take long actions, use threads to offload them.
See https://redis.io/topics/notifications for more information.
RedisModule_GetNotifyKeyspaceEvents
int RedisModule_GetNotifyKeyspaceEvents();
Available since: 6.0.0
Get the configured bitmap of notify-keyspace-events (Could be used
for additional filtering in RedisModuleNotificationFunc
)
RedisModule_NotifyKeyspaceEvent
int RedisModule_NotifyKeyspaceEvent(RedisModuleCtx *ctx,
int type,
const char *event,
RedisModuleString *key);
Available since: 6.0.0
Expose notifyKeyspaceEvent to modules
Modules Cluster API
RedisModule_RegisterClusterMessageReceiver
void RedisModule_RegisterClusterMessageReceiver(RedisModuleCtx *ctx,
uint8_t type,
RedisModuleClusterMessageReceiver callback);
Available since: 5.0.0
Register a callback receiver for cluster messages of type ‘type’. If there was already a registered callback, this will replace the callback function with the one provided, otherwise if the callback is set to NULL and there is already a callback for this function, the callback is unregistered (so this API call is also used in order to delete the receiver).
RedisModule_SendClusterMessage
int RedisModule_SendClusterMessage(RedisModuleCtx *ctx,
const char *target_id,
uint8_t type,
const char *msg,
uint32_t len);
Available since: 5.0.0
Send a message to all the nodes in the cluster if target
is NULL, otherwise
at the specified target, which is a REDISMODULE_NODE_ID_LEN
bytes node ID, as
returned by the receiver callback or by the nodes iteration functions.
The function returns REDISMODULE_OK
if the message was successfully sent,
otherwise if the node is not connected or such node ID does not map to any
known cluster node, REDISMODULE_ERR
is returned.
RedisModule_GetClusterNodesList
char **RedisModule_GetClusterNodesList(RedisModuleCtx *ctx, size_t *numnodes);
Available since: 5.0.0
Return an array of string pointers, each string pointer points to a cluster
node ID of exactly REDISMODULE_NODE_ID_LEN
bytes (without any null term).
The number of returned node IDs is stored into *numnodes
.
However if this function is called by a module not running an a Redis
instance with Redis Cluster enabled, NULL is returned instead.
The IDs returned can be used with RedisModule_GetClusterNodeInfo()
in order
to get more information about single node.
The array returned by this function must be freed using the function
RedisModule_FreeClusterNodesList()
.
Example:
size_t count, j;
char **ids = RedisModule_GetClusterNodesList(ctx,&count);
for (j = 0; j < count; j++) {
RedisModule_Log(ctx,"notice","Node %.*s",
REDISMODULE_NODE_ID_LEN,ids[j]);
}
RedisModule_FreeClusterNodesList(ids);
RedisModule_FreeClusterNodesList
void RedisModule_FreeClusterNodesList(char **ids);
Available since: 5.0.0
Free the node list obtained with RedisModule_GetClusterNodesList
.
RedisModule_GetMyClusterID
const char *RedisModule_GetMyClusterID(void);
Available since: 5.0.0
Return this node ID (REDISMODULE_CLUSTER_ID_LEN
bytes) or NULL if the cluster
is disabled.
RedisModule_GetClusterSize
size_t RedisModule_GetClusterSize(void);
Available since: 5.0.0
Return the number of nodes in the cluster, regardless of their state (handshake, noaddress, …) so that the number of active nodes may actually be smaller, but not greater than this number. If the instance is not in cluster mode, zero is returned.
RedisModule_GetClusterNodeInfo
int RedisModule_GetClusterNodeInfo(RedisModuleCtx *ctx,
const char *id,
char *ip,
char *master_id,
int *port,
int *flags);
Available since: 5.0.0
Populate the specified info for the node having as ID the specified ‘id’,
then returns REDISMODULE_OK
. Otherwise if the node ID does not exist from
the POV of this local node, REDISMODULE_ERR
is returned.
The arguments ip
, master_id
, port
and flags
can be NULL in case we don’t
need to populate back certain info. If an ip
and master_id
(only populated
if the instance is a slave) are specified, they point to buffers holding
at least REDISMODULE_NODE_ID_LEN
bytes. The strings written back as ip
and master_id
are not null terminated.
The list of flags reported is the following:
REDISMODULE_NODE_MYSELF
: This nodeREDISMODULE_NODE_MASTER
: The node is a masterREDISMODULE_NODE_SLAVE
: The node is a replicaREDISMODULE_NODE_PFAIL
: We see the node as failingREDISMODULE_NODE_FAIL
: The cluster agrees the node is failingREDISMODULE_NODE_NOFAILOVER
: The slave is configured to never failover
RedisModule_SetClusterFlags
void RedisModule_SetClusterFlags(RedisModuleCtx *ctx, uint64_t flags);
Available since: 5.0.0
Set Redis Cluster flags in order to change the normal behavior of Redis Cluster, especially with the goal of disabling certain functions. This is useful for modules that use the Cluster API in order to create a different distributed system, but still want to use the Redis Cluster message bus. Flags that can be set:
CLUSTER_MODULE_FLAG_NO_FAILOVER
CLUSTER_MODULE_FLAG_NO_REDIRECTION
With the following effects:
-
NO_FAILOVER
: prevent Redis Cluster slaves from failing over a dead master. Also disables the replica migration feature. -
NO_REDIRECTION
: Every node will accept any key, without trying to perform partitioning according to the Redis Cluster algorithm. Slots information will still be propagated across the cluster, but without effect.
Modules Timers API
Module timers are an high precision “green timers” abstraction where every module can register even millions of timers without problems, even if the actual event loop will just have a single timer that is used to awake the module timers subsystem in order to process the next event.
All the timers are stored into a radix tree, ordered by expire time, when the main Redis event loop timer callback is called, we try to process all the timers already expired one after the other. Then we re-enter the event loop registering a timer that will expire when the next to process module timer will expire.
Every time the list of active timers drops to zero, we unregister the main event loop timer, so that there is no overhead when such feature is not used.
RedisModule_CreateTimer
RedisModuleTimerID RedisModule_CreateTimer(RedisModuleCtx *ctx,
mstime_t period,
RedisModuleTimerProc callback,
void *data);
Available since: 5.0.0
Create a new timer that will fire after period
milliseconds, and will call
the specified function using data
as argument. The returned timer ID can be
used to get information from the timer or to stop it before it fires.
Note that for the common use case of a repeating timer (Re-registration
of the timer inside the RedisModuleTimerProc
callback) it matters when
this API is called:
If it is called at the beginning of ‘callback’ it means
the event will triggered every ‘period’.
If it is called at the end of ‘callback’ it means
there will ‘period’ milliseconds gaps between events.
(If the time it takes to execute ‘callback’ is negligible the two
statements above mean the same)
RedisModule_StopTimer
int RedisModule_StopTimer(RedisModuleCtx *ctx,
RedisModuleTimerID id,
void **data);
Available since: 5.0.0
Stop a timer, returns REDISMODULE_OK
if the timer was found, belonged to the
calling module, and was stopped, otherwise REDISMODULE_ERR
is returned.
If not NULL, the data pointer is set to the value of the data argument when
the timer was created.
RedisModule_GetTimerInfo
int RedisModule_GetTimerInfo(RedisModuleCtx *ctx,
RedisModuleTimerID id,
uint64_t *remaining,
void **data);
Available since: 5.0.0
Obtain information about a timer: its remaining time before firing
(in milliseconds), and the private data pointer associated with the timer.
If the timer specified does not exist or belongs to a different module
no information is returned and the function returns REDISMODULE_ERR
, otherwise
REDISMODULE_OK
is returned. The arguments remaining or data can be NULL if
the caller does not need certain information.
Modules EventLoop API
RedisModule_EventLoopAdd
int RedisModule_EventLoopAdd(int fd,
int mask,
RedisModuleEventLoopFunc func,
void *user_data);
Add a pipe / socket event to the event loop.
-
mask
must be one of the following values:REDISMODULE_EVENTLOOP_READABLE
REDISMODULE_EVENTLOOP_WRITABLE
REDISMODULE_EVENTLOOP_READABLE | REDISMODULE_EVENTLOOP_WRITABLE
On success REDISMODULE_OK
is returned, otherwise
REDISMODULE_ERR
is returned and errno is set to the following values:
- ERANGE:
fd
is negative or higher thanmaxclients
Redis config. - EINVAL:
callback
is NULL ormask
value is invalid.
errno
might take other values in case of an internal error.
Example:
void onReadable(int fd, void *user_data, int mask) {
char buf[32];
int bytes = read(fd,buf,sizeof(buf));
printf("Read %d bytes \n", bytes);
}
RM_EventLoopAdd(fd, REDISMODULE_EVENTLOOP_READABLE, onReadable, NULL);
RedisModule_EventLoopDel
int RedisModule_EventLoopDel(int fd, int mask);
Delete a pipe / socket event from the event loop.
-
mask
must be one of the following values:REDISMODULE_EVENTLOOP_READABLE
REDISMODULE_EVENTLOOP_WRITABLE
REDISMODULE_EVENTLOOP_READABLE | REDISMODULE_EVENTLOOP_WRITABLE
On success REDISMODULE_OK
is returned, otherwise
REDISMODULE_ERR
is returned and errno is set to the following values:
- ERANGE:
fd
is negative or higher thanmaxclients
Redis config. - EINVAL:
mask
value is invalid.
RedisModule_EventLoopAddOneShot
int RedisModule_EventLoopAddOneShot(RedisModuleEventLoopOneShotFunc func,
void *user_data);
This function can be called from other threads to trigger callback on Redis
main thread. On success REDISMODULE_OK
is returned. If func
is NULL
REDISMODULE_ERR
is returned and errno is set to EINVAL.
Modules ACL API
Implements a hook into the authentication and authorization within Redis.
RedisModule_CreateModuleUser
RedisModuleUser *RedisModule_CreateModuleUser(const char *name);
Available since: 6.0.0
Creates a Redis ACL user that the module can use to authenticate a client.
After obtaining the user, the module should set what such user can do
using the RedisModule_SetUserACL()
function. Once configured, the user
can be used in order to authenticate a connection, with the specified
ACL rules, using the RedisModule_AuthClientWithUser()
function.
Note that:
- Users created here are not listed by the ACL command.
- Users created here are not checked for duplicated name, so it’s up to the module calling this function to take care of not creating users with the same name.
- The created user can be used to authenticate multiple Redis connections.
The caller can later free the user using the function
RedisModule_FreeModuleUser()
. When this function is called, if there are
still clients authenticated with this user, they are disconnected.
The function to free the user should only be used when the caller really
wants to invalidate the user to define a new one with different
capabilities.
RedisModule_FreeModuleUser
int RedisModule_FreeModuleUser(RedisModuleUser *user);
Available since: 6.0.0
Frees a given user and disconnects all of the clients that have been
authenticated with it. See RedisModule_CreateModuleUser
for detailed usage.
RedisModule_SetModuleUserACL
int RedisModule_SetModuleUserACL(RedisModuleUser *user, const char* acl);
Available since: 6.0.0
Sets the permissions of a user created through the redis module
interface. The syntax is the same as ACL SETUSER, so refer to the
documentation in acl.c for more information. See RedisModule_CreateModuleUser
for detailed usage.
Returns REDISMODULE_OK
on success and REDISMODULE_ERR
on failure
and will set an errno describing why the operation failed.
RedisModule_GetCurrentUserName
RedisModuleString *RedisModule_GetCurrentUserName(RedisModuleCtx *ctx);
Retrieve the user name of the client connection behind the current context.
The user name can be used later, in order to get a RedisModuleUser
.
See more information in RedisModule_GetModuleUserFromUserName
.
The returned string must be released with RedisModule_FreeString()
or by
enabling automatic memory management.
RedisModule_GetModuleUserFromUserName
RedisModuleUser *RedisModule_GetModuleUserFromUserName(RedisModuleString *name);
A RedisModuleUser
can be used to check if command, key or channel can be executed or
accessed according to the ACLs rules associated with that user.
When a Module wants to do ACL checks on a general ACL user (not created by RedisModule_CreateModuleUser
),
it can get the RedisModuleUser
from this API, based on the user name retrieved by RedisModule_GetCurrentUserName
.
Since a general ACL user can be deleted at any time, this RedisModuleUser
should be used only in the context
where this function was called. In order to do ACL checks out of that context, the Module can store the user name,
and call this API at any other context.
Returns NULL if the user is disabled or the user does not exist.
The caller should later free the user using the function RedisModule_FreeModuleUser()
.
RedisModule_ACLCheckCommandPermissions
int RedisModule_ACLCheckCommandPermissions(RedisModuleUser *user,
RedisModuleString **argv,
int argc);
Checks if the command can be executed by the user, according to the ACLs associated with it.
On success a REDISMODULE_OK
is returned, otherwise
REDISMODULE_ERR
is returned and errno is set to the following values:
- ENOENT: Specified command does not exist.
- EACCES: Command cannot be executed, according to ACL rules
RedisModule_ACLCheckKeyPermissions
int RedisModule_ACLCheckKeyPermissions(RedisModuleUser *user,
RedisModuleString *key,
int flags);
Check if the key can be accessed by the user according to the ACLs attached to the user
and the flags representing the key access. The flags are the same that are used in the
keyspec for logical operations. These flags are documented in RedisModule_SetCommandInfo
as
the REDISMODULE_CMD_KEY_ACCESS
, REDISMODULE_CMD_KEY_UPDATE
, REDISMODULE_CMD_KEY_INSERT
,
and REDISMODULE_CMD_KEY_DELETE
flags.
If no flags are supplied, the user is still required to have some access to the key for this command to return successfully.
If the user is able to access the key then REDISMODULE_OK
is returned, otherwise
REDISMODULE_ERR
is returned and errno is set to one of the following values:
- EINVAL: The provided flags are invalid.
- EACCESS: The user does not have permission to access the key.
RedisModule_ACLCheckChannelPermissions
int RedisModule_ACLCheckChannelPermissions(RedisModuleUser *user,
RedisModuleString *ch,
int flags);
Check if the pubsub channel can be accessed by the user based off of the given
access flags. See RedisModule_ChannelAtPosWithFlags
for more information about the
possible flags that can be passed in.
If the user is able to acecss the pubsub channel then REDISMODULE_OK
is returned, otherwise
REDISMODULE_ERR
is returned and errno is set to one of the following values:
- EINVAL: The provided flags are invalid.
- EACCESS: The user does not have permission to access the pubsub channel.
RedisModule_ACLAddLogEntry
void RedisModule_ACLAddLogEntry(RedisModuleCtx *ctx,
RedisModuleUser *user,
RedisModuleString *object);
Adds a new entry in the ACL log.
Returns REDISMODULE_OK
on success and REDISMODULE_ERR
on error.
For more information about ACL log, please refer to https://redis.io/commands/acl-log
RedisModule_AuthenticateClientWithUser
int RedisModule_AuthenticateClientWithUser(RedisModuleCtx *ctx,
RedisModuleUser *module_user,
RedisModuleUserChangedFunc callback,
void *privdata,
uint64_t *client_id);
Available since: 6.0.0
Authenticate the current context’s user with the provided redis acl user.
Returns REDISMODULE_ERR
if the user is disabled.
See authenticateClientWithUser for information about callback, client_id
,
and general usage for authentication.
RedisModule_AuthenticateClientWithACLUser
int RedisModule_AuthenticateClientWithACLUser(RedisModuleCtx *ctx,
const char *name,
size_t len,
RedisModuleUserChangedFunc callback,
void *privdata,
uint64_t *client_id);
Available since: 6.0.0
Authenticate the current context’s user with the provided redis acl user.
Returns REDISMODULE_ERR
if the user is disabled or the user does not exist.
See authenticateClientWithUser for information about callback, client_id
,
and general usage for authentication.
RedisModule_DeauthenticateAndCloseClient
int RedisModule_DeauthenticateAndCloseClient(RedisModuleCtx *ctx,
uint64_t client_id);
Available since: 6.0.0
Deauthenticate and close the client. The client resources will not be
be immediately freed, but will be cleaned up in a background job. This is
the recommended way to deauthenticate a client since most clients can’t
handle users becoming deauthenticated. Returns REDISMODULE_ERR
when the
client doesn’t exist and REDISMODULE_OK
when the operation was successful.
The client ID is returned from the RedisModule_AuthenticateClientWithUser
and
RedisModule_AuthenticateClientWithACLUser
APIs, but can be obtained through
the CLIENT api or through server events.
This function is not thread safe, and must be executed within the context of a command or thread safe context.
RedisModule_GetClientCertificate
RedisModuleString *RedisModule_GetClientCertificate(RedisModuleCtx *ctx,
uint64_t client_id);
Available since: 6.0.9
Return the X.509 client-side certificate used by the client to authenticate this connection.
The return value is an allocated RedisModuleString
that is a X.509 certificate
encoded in PEM (Base64) format. It should be freed (or auto-freed) by the caller.
A NULL value is returned in the following conditions:
- Connection ID does not exist
- Connection is not a TLS connection
- Connection is a TLS connection but no client certificate was used
Modules Dictionary API
Implements a sorted dictionary (actually backed by a radix tree) with the usual get / set / del / num-items API, together with an iterator capable of going back and forth.
RedisModule_CreateDict
RedisModuleDict *RedisModule_CreateDict(RedisModuleCtx *ctx);
Available since: 5.0.0
Create a new dictionary. The ‘ctx’ pointer can be the current module context or NULL, depending on what you want. Please follow the following rules:
- Use a NULL context if you plan to retain a reference to this dictionary that will survive the time of the module callback where you created it.
- Use a NULL context if no context is available at the time you are creating the dictionary (of course…).
- However use the current callback context as ‘ctx’ argument if the dictionary time to live is just limited to the callback scope. In this case, if enabled, you can enjoy the automatic memory management that will reclaim the dictionary memory, as well as the strings returned by the Next / Prev dictionary iterator calls.
RedisModule_FreeDict
void RedisModule_FreeDict(RedisModuleCtx *ctx, RedisModuleDict *d);
Available since: 5.0.0
Free a dictionary created with RedisModule_CreateDict()
. You need to pass the
context pointer ‘ctx’ only if the dictionary was created using the
context instead of passing NULL.
RedisModule_DictSize
uint64_t RedisModule_DictSize(RedisModuleDict *d);
Available since: 5.0.0
Return the size of the dictionary (number of keys).
RedisModule_DictSetC
int RedisModule_DictSetC(RedisModuleDict *d,
void *key,
size_t keylen,
void *ptr);
Available since: 5.0.0
Store the specified key into the dictionary, setting its value to the
pointer ‘ptr’. If the key was added with success, since it did not
already exist, REDISMODULE_OK
is returned. Otherwise if the key already
exists the function returns REDISMODULE_ERR
.
RedisModule_DictReplaceC
int RedisModule_DictReplaceC(RedisModuleDict *d,
void *key,
size_t keylen,
void *ptr);
Available since: 5.0.0
Like RedisModule_DictSetC()
but will replace the key with the new
value if the key already exists.
RedisModule_DictSet
int RedisModule_DictSet(RedisModuleDict *d, RedisModuleString *key, void *ptr);
Available since: 5.0.0
Like RedisModule_DictSetC()
but takes the key as a RedisModuleString
.
RedisModule_DictReplace
int RedisModule_DictReplace(RedisModuleDict *d,
RedisModuleString *key,
void *ptr);
Available since: 5.0.0
Like RedisModule_DictReplaceC()
but takes the key as a RedisModuleString
.
RedisModule_DictGetC
void *RedisModule_DictGetC(RedisModuleDict *d,
void *key,
size_t keylen,
int *nokey);
Available since: 5.0.0
Return the value stored at the specified key. The function returns NULL both in the case the key does not exist, or if you actually stored NULL at key. So, optionally, if the ‘nokey’ pointer is not NULL, it will be set by reference to 1 if the key does not exist, or to 0 if the key exists.
RedisModule_DictGet
void *RedisModule_DictGet(RedisModuleDict *d,
RedisModuleString *key,
int *nokey);
Available since: 5.0.0
Like RedisModule_DictGetC()
but takes the key as a RedisModuleString
.
RedisModule_DictDelC
int RedisModule_DictDelC(RedisModuleDict *d,
void *key,
size_t keylen,
void *oldval);
Available since: 5.0.0
Remove the specified key from the dictionary, returning REDISMODULE_OK
if
the key was found and deleted, or REDISMODULE_ERR
if instead there was
no such key in the dictionary. When the operation is successful, if
‘oldval’ is not NULL, then ‘*oldval’ is set to the value stored at the
key before it was deleted. Using this feature it is possible to get
a pointer to the value (for instance in order to release it), without
having to call RedisModule_DictGet()
before deleting the key.
RedisModule_DictDel
int RedisModule_DictDel(RedisModuleDict *d,
RedisModuleString *key,
void *oldval);
Available since: 5.0.0
Like RedisModule_DictDelC()
but gets the key as a RedisModuleString
.
RedisModule_DictIteratorStartC
RedisModuleDictIter *RedisModule_DictIteratorStartC(RedisModuleDict *d,
const char *op,
void *key,
size_t keylen);
Available since: 5.0.0
Return an iterator, setup in order to start iterating from the specified key by applying the operator ‘op’, which is just a string specifying the comparison operator to use in order to seek the first element. The operators available are:
^
– Seek the first (lexicographically smaller) key.$
– Seek the last (lexicographically bigger) key.>
– Seek the first element greater than the specified key.>=
– Seek the first element greater or equal than the specified key.<
– Seek the first element smaller than the specified key.<=
– Seek the first element smaller or equal than the specified key.==
– Seek the first element matching exactly the specified key.
Note that for ^
and $
the passed key is not used, and the user may
just pass NULL with a length of 0.
If the element to start the iteration cannot be seeked based on the
key and operator passed, RedisModule_DictNext()
/ Prev() will just return
REDISMODULE_ERR
at the first call, otherwise they’ll produce elements.
RedisModule_DictIteratorStart
RedisModuleDictIter *RedisModule_DictIteratorStart(RedisModuleDict *d,
const char *op,
RedisModuleString *key);
Available since: 5.0.0
Exactly like RedisModule_DictIteratorStartC
, but the key is passed as a
RedisModuleString
.
RedisModule_DictIteratorStop
void RedisModule_DictIteratorStop(RedisModuleDictIter *di);
Available since: 5.0.0
Release the iterator created with RedisModule_DictIteratorStart()
. This call
is mandatory otherwise a memory leak is introduced in the module.
RedisModule_DictIteratorReseekC
int RedisModule_DictIteratorReseekC(RedisModuleDictIter *di,
const char *op,
void *key,
size_t keylen);
Available since: 5.0.0
After its creation with RedisModule_DictIteratorStart()
, it is possible to
change the currently selected element of the iterator by using this
API call. The result based on the operator and key is exactly like
the function RedisModule_DictIteratorStart()
, however in this case the
return value is just REDISMODULE_OK
in case the seeked element was found,
or REDISMODULE_ERR
in case it was not possible to seek the specified
element. It is possible to reseek an iterator as many times as you want.
RedisModule_DictIteratorReseek
int RedisModule_DictIteratorReseek(RedisModuleDictIter *di,
const char *op,
RedisModuleString *key);
Available since: 5.0.0
Like RedisModule_DictIteratorReseekC()
but takes the key as as a
RedisModuleString
.
RedisModule_DictNextC
void *RedisModule_DictNextC(RedisModuleDictIter *di,
size_t *keylen,
void **dataptr);
Available since: 5.0.0
Return the current item of the dictionary iterator di
and steps to the
next element. If the iterator already yield the last element and there
are no other elements to return, NULL is returned, otherwise a pointer
to a string representing the key is provided, and the *keylen
length
is set by reference (if keylen is not NULL). The *dataptr
, if not NULL
is set to the value of the pointer stored at the returned key as auxiliary
data (as set by the RedisModule_DictSet
API).
Usage example:
... create the iterator here ...
char *key;
void *data;
while((key = RedisModule_DictNextC(iter,&keylen,&data)) != NULL) {
printf("%.*s %p\n", (int)keylen, key, data);
}
The returned pointer is of type void because sometimes it makes sense
to cast it to a char*
sometimes to an unsigned char*
depending on the
fact it contains or not binary data, so this API ends being more
comfortable to use.
The validity of the returned pointer is until the next call to the next/prev iterator step. Also the pointer is no longer valid once the iterator is released.
RedisModule_DictPrevC
void *RedisModule_DictPrevC(RedisModuleDictIter *di,
size_t *keylen,
void **dataptr);
Available since: 5.0.0
This function is exactly like RedisModule_DictNext()
but after returning
the currently selected element in the iterator, it selects the previous
element (lexicographically smaller) instead of the next one.
RedisModule_DictNext
RedisModuleString *RedisModule_DictNext(RedisModuleCtx *ctx,
RedisModuleDictIter *di,
void **dataptr);
Available since: 5.0.0
Like RedisModuleNextC()
, but instead of returning an internally allocated
buffer and key length, it returns directly a module string object allocated
in the specified context ‘ctx’ (that may be NULL exactly like for the main
API RedisModule_CreateString
).
The returned string object should be deallocated after use, either manually or by using a context that has automatic memory management active.
RedisModule_DictPrev
RedisModuleString *RedisModule_DictPrev(RedisModuleCtx *ctx,
RedisModuleDictIter *di,
void **dataptr);
Available since: 5.0.0
Like RedisModule_DictNext()
but after returning the currently selected
element in the iterator, it selects the previous element (lexicographically
smaller) instead of the next one.
RedisModule_DictCompareC
int RedisModule_DictCompareC(RedisModuleDictIter *di,
const char *op,
void *key,
size_t keylen);
Available since: 5.0.0
Compare the element currently pointed by the iterator to the specified
element given by key/keylen, according to the operator ‘op’ (the set of
valid operators are the same valid for RedisModule_DictIteratorStart
).
If the comparison is successful the command returns REDISMODULE_OK
otherwise REDISMODULE_ERR
is returned.
This is useful when we want to just emit a lexicographical range, so in the loop, as we iterate elements, we can also check if we are still on range.
The function return REDISMODULE_ERR
if the iterator reached the
end of elements condition as well.
RedisModule_DictCompare
int RedisModule_DictCompare(RedisModuleDictIter *di,
const char *op,
RedisModuleString *key);
Available since: 5.0.0
Like RedisModule_DictCompareC
but gets the key to compare with the current
iterator key as a RedisModuleString
.
Modules Info fields
RedisModule_InfoAddSection
int RedisModule_InfoAddSection(RedisModuleInfoCtx *ctx, const char *name);
Available since: 6.0.0
Used to start a new section, before adding any fields. the section name will
be prefixed by <modulename>_
and must only include A-Z,a-z,0-9.
NULL or empty string indicates the default section (only <modulename>
) is used.
When return value is REDISMODULE_ERR
, the section should and will be skipped.
RedisModule_InfoBeginDictField
int RedisModule_InfoBeginDictField(RedisModuleInfoCtx *ctx, const char *name);
Available since: 6.0.0
Starts a dict field, similar to the ones in INFO KEYSPACE. Use normal
RedisModule_InfoAddField
* functions to add the items to this field, and
terminate with RedisModule_InfoEndDictField
.
RedisModule_InfoEndDictField
int RedisModule_InfoEndDictField(RedisModuleInfoCtx *ctx);
Available since: 6.0.0
Ends a dict field, see RedisModule_InfoBeginDictField
RedisModule_InfoAddFieldString
int RedisModule_InfoAddFieldString(RedisModuleInfoCtx *ctx,
const char *field,
RedisModuleString *value);
Available since: 6.0.0
Used by RedisModuleInfoFunc
to add info fields.
Each field will be automatically prefixed by <modulename>_
.
Field names or values must not include \r\n
or :
.
RedisModule_InfoAddFieldCString
int RedisModule_InfoAddFieldCString(RedisModuleInfoCtx *ctx,
const char *field,
const char *value);
Available since: 6.0.0
See RedisModule_InfoAddFieldString()
.
RedisModule_InfoAddFieldDouble
int RedisModule_InfoAddFieldDouble(RedisModuleInfoCtx *ctx,
const char *field,
double value);
Available since: 6.0.0
See RedisModule_InfoAddFieldString()
.
RedisModule_InfoAddFieldLongLong
int RedisModule_InfoAddFieldLongLong(RedisModuleInfoCtx *ctx,
const char *field,
long long value);
Available since: 6.0.0
See RedisModule_InfoAddFieldString()
.
RedisModule_InfoAddFieldULongLong
int RedisModule_InfoAddFieldULongLong(RedisModuleInfoCtx *ctx,
const char *field,
unsigned long long value);
Available since: 6.0.0
See RedisModule_InfoAddFieldString()
.
RedisModule_RegisterInfoFunc
int RedisModule_RegisterInfoFunc(RedisModuleCtx *ctx, RedisModuleInfoFunc cb);
Available since: 6.0.0
Registers callback for the INFO command. The callback should add INFO fields
by calling the RedisModule_InfoAddField*()
functions.
RedisModule_GetServerInfo
RedisModuleServerInfoData *RedisModule_GetServerInfo(RedisModuleCtx *ctx,
const char *section);
Available since: 6.0.0
Get information about the server similar to the one that returns from the
INFO command. This function takes an optional ‘section’ argument that may
be NULL. The return value holds the output and can be used with
RedisModule_ServerInfoGetField
and alike to get the individual fields.
When done, it needs to be freed with RedisModule_FreeServerInfo
or with the
automatic memory management mechanism if enabled.
RedisModule_FreeServerInfo
void RedisModule_FreeServerInfo(RedisModuleCtx *ctx,
RedisModuleServerInfoData *data);
Available since: 6.0.0
Free data created with RedisModule_GetServerInfo()
. You need to pass the
context pointer ‘ctx’ only if the dictionary was created using the
context instead of passing NULL.
RedisModule_ServerInfoGetField
RedisModuleString *RedisModule_ServerInfoGetField(RedisModuleCtx *ctx,
RedisModuleServerInfoData *data,
const char* field);
Available since: 6.0.0
Get the value of a field from data collected with RedisModule_GetServerInfo()
. You
need to pass the context pointer ‘ctx’ only if you want to use auto memory
mechanism to release the returned string. Return value will be NULL if the
field was not found.
RedisModule_ServerInfoGetFieldC
const char *RedisModule_ServerInfoGetFieldC(RedisModuleServerInfoData *data,
const char* field);
Available since: 6.0.0
Similar to RedisModule_ServerInfoGetField
, but returns a char* which should not be freed but the caller.
RedisModule_ServerInfoGetFieldSigned
long long RedisModule_ServerInfoGetFieldSigned(RedisModuleServerInfoData *data,
const char* field,
int *out_err);
Available since: 6.0.0
Get the value of a field from data collected with RedisModule_GetServerInfo()
. If the
field is not found, or is not numerical or out of range, return value will be
0, and the optional out_err
argument will be set to REDISMODULE_ERR
.
RedisModule_ServerInfoGetFieldUnsigned
unsigned long long RedisModule_ServerInfoGetFieldUnsigned(RedisModuleServerInfoData *data,
const char* field,
int *out_err);
Available since: 6.0.0
Get the value of a field from data collected with RedisModule_GetServerInfo()
. If the
field is not found, or is not numerical or out of range, return value will be
0, and the optional out_err
argument will be set to REDISMODULE_ERR
.
RedisModule_ServerInfoGetFieldDouble
double RedisModule_ServerInfoGetFieldDouble(RedisModuleServerInfoData *data,
const char* field,
int *out_err);
Available since: 6.0.0
Get the value of a field from data collected with RedisModule_GetServerInfo()
. If the
field is not found, or is not a double, return value will be 0, and the
optional out_err
argument will be set to REDISMODULE_ERR
.
Modules utility APIs
RedisModule_GetRandomBytes
void RedisModule_GetRandomBytes(unsigned char *dst, size_t len);
Available since: 5.0.0
Return random bytes using SHA1 in counter mode with a /dev/urandom initialized seed. This function is fast so can be used to generate many bytes without any effect on the operating system entropy pool. Currently this function is not thread safe.
RedisModule_GetRandomHexChars
void RedisModule_GetRandomHexChars(char *dst, size_t len);
Available since: 5.0.0
Like RedisModule_GetRandomBytes()
but instead of setting the string to
random bytes the string is set to random characters in the in the
hex charset [0-9a-f].
Modules API exporting / importing
RedisModule_ExportSharedAPI
int RedisModule_ExportSharedAPI(RedisModuleCtx *ctx,
const char *apiname,
void *func);
Available since: 5.0.4
This function is called by a module in order to export some API with a
given name. Other modules will be able to use this API by calling the
symmetrical function RedisModule_GetSharedAPI()
and casting the return value to
the right function pointer.
The function will return REDISMODULE_OK
if the name is not already taken,
otherwise REDISMODULE_ERR
will be returned and no operation will be
performed.
IMPORTANT: the apiname argument should be a string literal with static lifetime. The API relies on the fact that it will always be valid in the future.
RedisModule_GetSharedAPI
void *RedisModule_GetSharedAPI(RedisModuleCtx *ctx, const char *apiname);
Available since: 5.0.4
Request an exported API pointer. The return value is just a void pointer that the caller of this function will be required to cast to the right function pointer, so this is a private contract between modules.
If the requested API is not available then NULL is returned. Because modules can be loaded at different times with different order, this function calls should be put inside some module generic API registering step, that is called every time a module attempts to execute a command that requires external APIs: if some API cannot be resolved, the command should return an error.
Here is an example:
int ... myCommandImplementation() {
if (getExternalAPIs() == 0) {
reply with an error here if we cannot have the APIs
}
// Use the API:
myFunctionPointer(foo);
}
And the function registerAPI() is:
int getExternalAPIs(void) {
static int api_loaded = 0;
if (api_loaded != 0) return 1; // APIs already resolved.
myFunctionPointer = RedisModule_GetOtherModuleAPI("...");
if (myFunctionPointer == NULL) return 0;
return 1;
}
Module Command Filter API
RedisModule_RegisterCommandFilter
RedisModuleCommandFilter *RedisModule_RegisterCommandFilter(RedisModuleCtx *ctx,
RedisModuleCommandFilterFunc callback,
int flags);
Available since: 5.0.5
Register a new command filter function.
Command filtering makes it possible for modules to extend Redis by plugging into the execution flow of all commands.
A registered filter gets called before Redis executes any command. This includes both core Redis commands and commands registered by any module. The filter applies in all execution paths including:
- Invocation by a client.
- Invocation through
RedisModule_Call()
by any module. - Invocation through Lua ‘redis.`call()``.
- Replication of a command from a master.
The filter executes in a special filter context, which is different and more
limited than a RedisModuleCtx
. Because the filter affects any command, it
must be implemented in a very efficient way to reduce the performance impact
on Redis. All Redis Module API calls that require a valid context (such as
RedisModule_Call()
, RedisModule_OpenKey()
, etc.) are not supported in a
filter context.
The RedisModuleCommandFilterCtx
can be used to inspect or modify the
executed command and its arguments. As the filter executes before Redis
begins processing the command, any change will affect the way the command is
processed. For example, a module can override Redis commands this way:
- Register a
MODULE.SET
command which implements an extended version of the RedisSET
command. - Register a command filter which detects invocation of
SET
on a specific pattern of keys. Once detected, the filter will replace the first argument fromSET
toMODULE.SET
. - When filter execution is complete, Redis considers the new command name and therefore executes the module’s own command.
Note that in the above use case, if MODULE.SET
itself uses
RedisModule_Call()
the filter will be applied on that call as well. If
that is not desired, the REDISMODULE_CMDFILTER_NOSELF
flag can be set when
registering the filter.
The REDISMODULE_CMDFILTER_NOSELF
flag prevents execution flows that
originate from the module’s own RM_Call()
from reaching the filter. This
flag is effective for all execution flows, including nested ones, as long as
the execution begins from the module’s command context or a thread-safe
context that is associated with a blocking command.
Detached thread-safe contexts are not associated with the module and cannot be protected by this flag.
If multiple filters are registered (by the same or different modules), they are executed in the order of registration.
RedisModule_UnregisterCommandFilter
int RedisModule_UnregisterCommandFilter(RedisModuleCtx *ctx,
RedisModuleCommandFilter *filter);
Available since: 5.0.5
Unregister a command filter.
RedisModule_CommandFilterArgsCount
int RedisModule_CommandFilterArgsCount(RedisModuleCommandFilterCtx *fctx);
Available since: 5.0.5
Return the number of arguments a filtered command has. The number of arguments include the command itself.
RedisModule_CommandFilterArgGet
RedisModuleString *RedisModule_CommandFilterArgGet(RedisModuleCommandFilterCtx *fctx,
int pos);
Available since: 5.0.5
Return the specified command argument. The first argument (position 0) is the command itself, and the rest are user-provided args.
RedisModule_CommandFilterArgInsert
int RedisModule_CommandFilterArgInsert(RedisModuleCommandFilterCtx *fctx,
int pos,
RedisModuleString *arg);
Available since: 5.0.5
Modify the filtered command by inserting a new argument at the specified
position. The specified RedisModuleString
argument may be used by Redis
after the filter context is destroyed, so it must not be auto-memory
allocated, freed or used elsewhere.
RedisModule_CommandFilterArgReplace
int RedisModule_CommandFilterArgReplace(RedisModuleCommandFilterCtx *fctx,
int pos,
RedisModuleString *arg);
Available since: 5.0.5
Modify the filtered command by replacing an existing argument with a new one.
The specified RedisModuleString
argument may be used by Redis after the
filter context is destroyed, so it must not be auto-memory allocated, freed
or used elsewhere.
RedisModule_CommandFilterArgDelete
int RedisModule_CommandFilterArgDelete(RedisModuleCommandFilterCtx *fctx,
int pos);
Available since: 5.0.5
Modify the filtered command by deleting an argument at the specified position.
RedisModule_MallocSize
size_t RedisModule_MallocSize(void* ptr);
Available since: 6.0.0
For a given pointer allocated via RedisModule_Alloc()
or
RedisModule_Realloc()
, return the amount of memory allocated for it.
Note that this may be different (larger) than the memory we allocated
with the allocation calls, since sometimes the underlying allocator
will allocate more memory.
RedisModule_GetUsedMemoryRatio
float RedisModule_GetUsedMemoryRatio();
Available since: 6.0.0
Return the a number between 0 to 1 indicating the amount of memory currently used, relative to the Redis “maxmemory” configuration.
- 0 - No memory limit configured.
- Between 0 and 1 - The percentage of the memory used normalized in 0-1 range.
- Exactly 1 - Memory limit reached.
- Greater 1 - More memory used than the configured limit.
Scanning keyspace and hashes
RedisModule_ScanCursorCreate
RedisModuleScanCursor *RedisModule_ScanCursorCreate();
Available since: 6.0.0
Create a new cursor to be used with RedisModule_Scan
RedisModule_ScanCursorRestart
void RedisModule_ScanCursorRestart(RedisModuleScanCursor *cursor);
Available since: 6.0.0
Restart an existing cursor. The keys will be rescanned.
RedisModule_ScanCursorDestroy
void RedisModule_ScanCursorDestroy(RedisModuleScanCursor *cursor);
Available since: 6.0.0
Destroy the cursor struct.
RedisModule_Scan
int RedisModule_Scan(RedisModuleCtx *ctx,
RedisModuleScanCursor *cursor,
RedisModuleScanCB fn,
void *privdata);
Available since: 6.0.0
Scan API that allows a module to scan all the keys and value in the selected db.
Callback for scan implementation.
void scan_callback(RedisModuleCtx *ctx, RedisModuleString *keyname,
RedisModuleKey *key, void *privdata);
ctx
: the redis module context provided to for the scan.keyname
: owned by the caller and need to be retained if used after this function.key
: holds info on the key and value, it is provided as best effort, in some cases it might be NULL, in which case the user should (can) useRedisModule_OpenKey()
(and CloseKey too). when it is provided, it is owned by the caller and will be free when the callback returns.privdata
: the user data provided toRedisModule_Scan()
.
The way it should be used:
RedisModuleCursor *c = RedisModule_ScanCursorCreate();
while(RedisModule_Scan(ctx, c, callback, privateData));
RedisModule_ScanCursorDestroy(c);
It is also possible to use this API from another thread while the lock
is acquired during the actual call to RedisModule_Scan
:
RedisModuleCursor *c = RedisModule_ScanCursorCreate();
RedisModule_ThreadSafeContextLock(ctx);
while(RedisModule_Scan(ctx, c, callback, privateData)){
RedisModule_ThreadSafeContextUnlock(ctx);
// do some background job
RedisModule_ThreadSafeContextLock(ctx);
}
RedisModule_ScanCursorDestroy(c);
The function will return 1 if there are more elements to scan and 0 otherwise, possibly setting errno if the call failed.
It is also possible to restart an existing cursor using RedisModule_ScanCursorRestart
.
IMPORTANT: This API is very similar to the Redis SCAN command from the point of view of the guarantees it provides. This means that the API may report duplicated keys, but guarantees to report at least one time every key that was there from the start to the end of the scanning process.
NOTE: If you do database changes within the callback, you should be aware that the internal state of the database may change. For instance it is safe to delete or modify the current key, but may not be safe to delete any other key. Moreover playing with the Redis keyspace while iterating may have the effect of returning more duplicates. A safe pattern is to store the keys names you want to modify elsewhere, and perform the actions on the keys later when the iteration is complete. However this can cost a lot of memory, so it may make sense to just operate on the current key when possible during the iteration, given that this is safe.
RedisModule_ScanKey
int RedisModule_ScanKey(RedisModuleKey *key,
RedisModuleScanCursor *cursor,
RedisModuleScanKeyCB fn,
void *privdata);
Available since: 6.0.0
Scan api that allows a module to scan the elements in a hash, set or sorted set key
Callback for scan implementation.
void scan_callback(RedisModuleKey *key, RedisModuleString* field, RedisModuleString* value, void *privdata);
- key - the redis key context provided to for the scan.
- field - field name, owned by the caller and need to be retained if used after this function.
- value - value string or NULL for set type, owned by the caller and need to be retained if used after this function.
- privdata - the user data provided to
RedisModule_ScanKey
.
The way it should be used:
RedisModuleCursor *c = RedisModule_ScanCursorCreate();
RedisModuleKey *key = RedisModule_OpenKey(...)
while(RedisModule_ScanKey(key, c, callback, privateData));
RedisModule_CloseKey(key);
RedisModule_ScanCursorDestroy(c);
It is also possible to use this API from another thread while the lock is acquired during
the actual call to RedisModule_ScanKey
, and re-opening the key each time:
RedisModuleCursor *c = RedisModule_ScanCursorCreate();
RedisModule_ThreadSafeContextLock(ctx);
RedisModuleKey *key = RedisModule_OpenKey(...)
while(RedisModule_ScanKey(ctx, c, callback, privateData)){
RedisModule_CloseKey(key);
RedisModule_ThreadSafeContextUnlock(ctx);
// do some background job
RedisModule_ThreadSafeContextLock(ctx);
RedisModuleKey *key = RedisModule_OpenKey(...)
}
RedisModule_CloseKey(key);
RedisModule_ScanCursorDestroy(c);
The function will return 1 if there are more elements to scan and 0 otherwise,
possibly setting errno if the call failed.
It is also possible to restart an existing cursor using RedisModule_ScanCursorRestart
.
NOTE: Certain operations are unsafe while iterating the object. For instance while the API guarantees to return at least one time all the elements that are present in the data structure consistently from the start to the end of the iteration (see HSCAN and similar commands documentation), the more you play with the elements, the more duplicates you may get. In general deleting the current element of the data structure is safe, while removing the key you are iterating is not safe.
Module fork API
RedisModule_Fork
int RedisModule_Fork(RedisModuleForkDoneHandler cb, void *user_data);
Available since: 6.0.0
Create a background child process with the current frozen snapshot of the
main process where you can do some processing in the background without
affecting / freezing the traffic and no need for threads and GIL locking.
Note that Redis allows for only one concurrent fork.
When the child wants to exit, it should call RedisModule_ExitFromChild
.
If the parent wants to kill the child it should call RedisModule_KillForkChild
The done handler callback will be executed on the parent process when the
child existed (but not when killed)
Return: -1 on failure, on success the parent process will get a positive PID
of the child, and the child process will get 0.
RedisModule_SendChildHeartbeat
void RedisModule_SendChildHeartbeat(double progress);
Available since: 6.2.0
The module is advised to call this function from the fork child once in a while,
so that it can report progress and COW memory to the parent which will be
reported in INFO.
The progress
argument should between 0 and 1, or -1 when not available.
RedisModule_ExitFromChild
int RedisModule_ExitFromChild(int retcode);
Available since: 6.0.0
Call from the child process when you want to terminate it. retcode will be provided to the done handler executed on the parent process.
RedisModule_KillForkChild
int RedisModule_KillForkChild(int child_pid);
Available since: 6.0.0
Can be used to kill the forked child process from the parent process.
child_pid
would be the return value of RedisModule_Fork
.
Server hooks implementation
RedisModule_SubscribeToServerEvent
int RedisModule_SubscribeToServerEvent(RedisModuleCtx *ctx,
RedisModuleEvent event,
RedisModuleEventCallback callback);
Available since: 6.0.0
Register to be notified, via a callback, when the specified server event happens. The callback is called with the event as argument, and an additional argument which is a void pointer and should be cased to a specific type that is event-specific (but many events will just use NULL since they do not have additional information to pass to the callback).
If the callback is NULL and there was a previous subscription, the module will be unsubscribed. If there was a previous subscription and the callback is not null, the old callback will be replaced with the new one.
The callback must be of this type:
int (*RedisModuleEventCallback)(RedisModuleCtx *ctx,
RedisModuleEvent eid,
uint64_t subevent,
void *data);
The ‘ctx’ is a normal Redis module context that the callback can use in order to call other modules APIs. The ‘eid’ is the event itself, this is only useful in the case the module subscribed to multiple events: using the ‘id’ field of this structure it is possible to check if the event is one of the events we registered with this callback. The ‘subevent’ field depends on the event that fired.
Finally the ‘data’ pointer may be populated, only for certain events, with more relevant data.
Here is a list of events you can use as ‘eid’ and related sub events:
-
RedisModuleEvent_ReplicationRoleChanged
:This event is called when the instance switches from master to replica or the other way around, however the event is also called when the replica remains a replica but starts to replicate with a different master.
The following sub events are available:
REDISMODULE_SUBEVENT_REPLROLECHANGED_NOW_MASTER
REDISMODULE_SUBEVENT_REPLROLECHANGED_NOW_REPLICA
The ‘data’ field can be casted by the callback to a
RedisModuleReplicationInfo
structure with the following fields:int master; // true if master, false if replica char *masterhost; // master instance hostname for NOW_REPLICA int masterport; // master instance port for NOW_REPLICA char *replid1; // Main replication ID char *replid2; // Secondary replication ID uint64_t repl1_offset; // Main replication offset uint64_t repl2_offset; // Offset of replid2 validity
-
RedisModuleEvent_Persistence
This event is called when RDB saving or AOF rewriting starts and ends. The following sub events are available:
REDISMODULE_SUBEVENT_PERSISTENCE_RDB_START
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START
REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_RDB_START
REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START
REDISMODULE_SUBEVENT_PERSISTENCE_ENDED
REDISMODULE_SUBEVENT_PERSISTENCE_FAILED
The above events are triggered not just when the user calls the relevant commands like BGSAVE, but also when a saving operation or AOF rewriting occurs because of internal server triggers. The SYNC_RDB_START sub events are happening in the foreground due to SAVE command, FLUSHALL, or server shutdown, and the other RDB and AOF sub events are executed in a background fork child, so any action the module takes can only affect the generated AOF or RDB, but will not be reflected in the parent process and affect connected clients and commands. Also note that the AOF_START sub event may end up saving RDB content in case of an AOF with rdb-preamble.
-
RedisModuleEvent_FlushDB
The FLUSHALL, FLUSHDB or an internal flush (for instance because of replication, after the replica synchronization) happened. The following sub events are available:
REDISMODULE_SUBEVENT_FLUSHDB_START
REDISMODULE_SUBEVENT_FLUSHDB_END
The data pointer can be casted to a RedisModuleFlushInfo structure with the following fields:
int32_t async; // True if the flush is done in a thread. // See for instance FLUSHALL ASYNC. // In this case the END callback is invoked // immediately after the database is put // in the free list of the thread. int32_t dbnum; // Flushed database number, -1 for all the DBs // in the case of the FLUSHALL operation.
The start event is called before the operation is initiated, thus allowing the callback to call DBSIZE or other operation on the yet-to-free keyspace.
-
RedisModuleEvent_Loading
Called on loading operations: at startup when the server is started, but also after a first synchronization when the replica is loading the RDB file from the master. The following sub events are available:
REDISMODULE_SUBEVENT_LOADING_RDB_START
REDISMODULE_SUBEVENT_LOADING_AOF_START
REDISMODULE_SUBEVENT_LOADING_REPL_START
REDISMODULE_SUBEVENT_LOADING_ENDED
REDISMODULE_SUBEVENT_LOADING_FAILED
Note that AOF loading may start with an RDB data in case of rdb-preamble, in which case you’ll only receive an AOF_START event.
-
RedisModuleEvent_ClientChange
Called when a client connects or disconnects. The data pointer can be casted to a RedisModuleClientInfo structure, documented in RedisModule_GetClientInfoById(). The following sub events are available:
REDISMODULE_SUBEVENT_CLIENT_CHANGE_CONNECTED
REDISMODULE_SUBEVENT_CLIENT_CHANGE_DISCONNECTED
-
RedisModuleEvent_Shutdown
The server is shutting down. No subevents are available.
-
RedisModuleEvent_ReplicaChange
This event is called when the instance (that can be both a master or a replica) get a new online replica, or lose a replica since it gets disconnected. The following sub events are available:
REDISMODULE_SUBEVENT_REPLICA_CHANGE_ONLINE
REDISMODULE_SUBEVENT_REPLICA_CHANGE_OFFLINE
No additional information is available so far: future versions of Redis will have an API in order to enumerate the replicas connected and their state.
-
RedisModuleEvent_CronLoop
This event is called every time Redis calls the serverCron() function in order to do certain bookkeeping. Modules that are required to do operations from time to time may use this callback. Normally Redis calls this function 10 times per second, but this changes depending on the “hz” configuration. No sub events are available.
The data pointer can be casted to a RedisModuleCronLoop structure with the following fields:
int32_t hz; // Approximate number of events per second.
-
RedisModuleEvent_MasterLinkChange
This is called for replicas in order to notify when the replication link becomes functional (up) with our master, or when it goes down. Note that the link is not considered up when we just connected to the master, but only if the replication is happening correctly. The following sub events are available:
REDISMODULE_SUBEVENT_MASTER_LINK_UP
REDISMODULE_SUBEVENT_MASTER_LINK_DOWN
-
RedisModuleEvent_ModuleChange
This event is called when a new module is loaded or one is unloaded. The following sub events are available:
REDISMODULE_SUBEVENT_MODULE_LOADED
REDISMODULE_SUBEVENT_MODULE_UNLOADED
The data pointer can be casted to a RedisModuleModuleChange structure with the following fields:
const char* module_name; // Name of module loaded or unloaded. int32_t module_version; // Module version.
-
RedisModuleEvent_LoadingProgress
This event is called repeatedly called while an RDB or AOF file is being loaded. The following sub events are available:
REDISMODULE_SUBEVENT_LOADING_PROGRESS_RDB
REDISMODULE_SUBEVENT_LOADING_PROGRESS_AOF
The data pointer can be casted to a RedisModuleLoadingProgress structure with the following fields:
int32_t hz; // Approximate number of events per second. int32_t progress; // Approximate progress between 0 and 1024, // or -1 if unknown.
-
RedisModuleEvent_SwapDB
This event is called when a SWAPDB command has been successfully Executed. For this event call currently there is no subevents available.
The data pointer can be casted to a RedisModuleSwapDbInfo structure with the following fields:
int32_t dbnum_first; // Swap Db first dbnum int32_t dbnum_second; // Swap Db second dbnum
-
RedisModuleEvent_ReplBackup
WARNING: Replication Backup events are deprecated since Redis 7.0 and are never fired. See RedisModuleEvent_ReplAsyncLoad for understanding how Async Replication Loading events are now triggered when repl-diskless-load is set to swapdb.
Called when repl-diskless-load config is set to swapdb, And redis needs to backup the the current database for the possibility to be restored later. A module with global data and maybe with aux_load and aux_save callbacks may need to use this notification to backup / restore / discard its globals. The following sub events are available:
REDISMODULE_SUBEVENT_REPL_BACKUP_CREATE
REDISMODULE_SUBEVENT_REPL_BACKUP_RESTORE
REDISMODULE_SUBEVENT_REPL_BACKUP_DISCARD
-
RedisModuleEvent_ReplAsyncLoad
Called when repl-diskless-load config is set to swapdb and a replication with a master of same data set history (matching replication ID) occurs. In which case redis serves current data set while loading new database in memory from socket. Modules must have declared they support this mechanism in order to activate it, through REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD flag. The following sub events are available:
REDISMODULE_SUBEVENT_REPL_ASYNC_LOAD_STARTED
REDISMODULE_SUBEVENT_REPL_ASYNC_LOAD_ABORTED
REDISMODULE_SUBEVENT_REPL_ASYNC_LOAD_COMPLETED
-
RedisModuleEvent_ForkChild
Called when a fork child (AOFRW, RDBSAVE, module fork…) is born/dies The following sub events are available:
REDISMODULE_SUBEVENT_FORK_CHILD_BORN
REDISMODULE_SUBEVENT_FORK_CHILD_DIED
-
RedisModuleEvent_EventLoop
Called on each event loop iteration, once just before the event loop goes to sleep or just after it wakes up. The following sub events are available:
REDISMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP
REDISMODULE_SUBEVENT_EVENTLOOP_AFTER_SLEEP
The function returns REDISMODULE_OK
if the module was successfully subscribed
for the specified event. If the API is called from a wrong context or unsupported event
is given then REDISMODULE_ERR
is returned.
RedisModule_IsSubEventSupported
int RedisModule_IsSubEventSupported(RedisModuleEvent event, int64_t subevent);
Available since: 6.0.9
For a given server event and subevent, return zero if the subevent is not supported and non-zero otherwise.
Key eviction API
RedisModule_SetLRU
int RedisModule_SetLRU(RedisModuleKey *key, mstime_t lru_idle);
Available since: 6.0.0
Set the key last access time for LRU based eviction. not relevant if the
servers’s maxmemory policy is LFU based. Value is idle time in milliseconds.
returns REDISMODULE_OK
if the LRU was updated, REDISMODULE_ERR
otherwise.
RedisModule_GetLRU
int RedisModule_GetLRU(RedisModuleKey *key, mstime_t *lru_idle);
Available since: 6.0.0
Gets the key last access time.
Value is idletime in milliseconds or -1 if the server’s eviction policy is
LFU based.
returns REDISMODULE_OK
if when key is valid.
RedisModule_SetLFU
int RedisModule_SetLFU(RedisModuleKey *key, long long lfu_freq);
Available since: 6.0.0
Set the key access frequency. only relevant if the server’s maxmemory policy
is LFU based.
The frequency is a logarithmic counter that provides an indication of
the access frequencyonly (must be <= 255).
returns REDISMODULE_OK
if the LFU was updated, REDISMODULE_ERR
otherwise.
RedisModule_GetLFU
int RedisModule_GetLFU(RedisModuleKey *key, long long *lfu_freq);
Available since: 6.0.0
Gets the key access frequency or -1 if the server’s eviction policy is not
LFU based.
returns REDISMODULE_OK
if when key is valid.
Miscellaneous APIs
RedisModule_GetContextFlagsAll
int RedisModule_GetContextFlagsAll();
Available since: 6.0.9
Returns the full ContextFlags mask, using the return value the module can check if a certain set of flags are supported by the redis server version in use. Example:
int supportedFlags = RM_GetContextFlagsAll();
if (supportedFlags & REDISMODULE_CTX_FLAGS_MULTI) {
// REDISMODULE_CTX_FLAGS_MULTI is supported
} else{
// REDISMODULE_CTX_FLAGS_MULTI is not supported
}
RedisModule_GetKeyspaceNotificationFlagsAll
int RedisModule_GetKeyspaceNotificationFlagsAll();
Available since: 6.0.9
Returns the full KeyspaceNotification mask, using the return value the module can check if a certain set of flags are supported by the redis server version in use. Example:
int supportedFlags = RM_GetKeyspaceNotificationFlagsAll();
if (supportedFlags & REDISMODULE_NOTIFY_LOADED) {
// REDISMODULE_NOTIFY_LOADED is supported
} else{
// REDISMODULE_NOTIFY_LOADED is not supported
}
RedisModule_GetServerVersion
int RedisModule_GetServerVersion();
Available since: 6.0.9
Return the redis version in format of 0x00MMmmpp. Example for 6.0.7 the return value will be 0x00060007.
RedisModule_GetTypeMethodVersion
int RedisModule_GetTypeMethodVersion();
Available since: 6.2.0
Return the current redis-server runtime value of REDISMODULE_TYPE_METHOD_VERSION
.
You can use that when calling RedisModule_CreateDataType
to know which fields of
RedisModuleTypeMethods
are gonna be supported and which will be ignored.
RedisModule_ModuleTypeReplaceValue
int RedisModule_ModuleTypeReplaceValue(RedisModuleKey *key,
moduleType *mt,
void *new_value,
void **old_value);
Available since: 6.0.0
Replace the value assigned to a module type.
The key must be open for writing, have an existing value, and have a moduleType that matches the one specified by the caller.
Unlike RedisModule_ModuleTypeSetValue()
which will free the old value, this function
simply swaps the old value with the new value.
The function returns REDISMODULE_OK
on success, REDISMODULE_ERR
on errors
such as:
- Key is not opened for writing.
- Key is not a module data type key.
- Key is a module datatype other than ‘mt’.
If old_value
is non-NULL, the old value is returned by reference.
RedisModule_GetCommandKeysWithFlags
int *RedisModule_GetCommandKeysWithFlags(RedisModuleCtx *ctx,
RedisModuleString **argv,
int argc,
int *num_keys,
int **out_flags);
For a specified command, parse its arguments and return an array that
contains the indexes of all key name arguments. This function is
essentially a more efficient way to do COMMAND GETKEYS
.
The out_flags
argument is optional, and can be set to NULL.
When provided it is filled with REDISMODULE_CMD_KEY_
flags in matching
indexes with the key indexes of the returned array.
A NULL return value indicates the specified command has no keys, or an error condition. Error conditions are indicated by setting errno as follows:
- ENOENT: Specified command does not exist.
- EINVAL: Invalid command arity specified.
NOTE: The returned array is not a Redis Module object so it does not
get automatically freed even when auto-memory is used. The caller
must explicitly call RedisModule_Free()
to free it, same as the out_flags
pointer if
used.
RedisModule_GetCommandKeys
int *RedisModule_GetCommandKeys(RedisModuleCtx *ctx,
RedisModuleString **argv,
int argc,
int *num_keys);
Available since: 6.0.9
Identinal to RedisModule_GetCommandKeysWithFlags
when flags are not needed.
RedisModule_GetCurrentCommandName
const char *RedisModule_GetCurrentCommandName(RedisModuleCtx *ctx);
Available since: 6.2.5
Return the name of the command currently running
Defrag API
RedisModule_RegisterDefragFunc
int RedisModule_RegisterDefragFunc(RedisModuleCtx *ctx,
RedisModuleDefragFunc cb);
Available since: 6.2.0
Register a defrag callback for global data, i.e. anything that the module may allocate that is not tied to a specific data type.
RedisModule_DefragShouldStop
int RedisModule_DefragShouldStop(RedisModuleDefragCtx *ctx);
Available since: 6.2.0
When the data type defrag callback iterates complex structures, this function should be called periodically. A zero (false) return indicates the callback may continue its work. A non-zero value (true) indicates it should stop.
When stopped, the callback may use RedisModule_DefragCursorSet()
to store its
position so it can later use RedisModule_DefragCursorGet()
to resume defragging.
When stopped and more work is left to be done, the callback should return 1. Otherwise, it should return 0.
NOTE: Modules should consider the frequency in which this function is called, so it generally makes sense to do small batches of work in between calls.
RedisModule_DefragCursorSet
int RedisModule_DefragCursorSet(RedisModuleDefragCtx *ctx,
unsigned long cursor);
Available since: 6.2.0
Store an arbitrary cursor value for future re-use.
This should only be called if RedisModule_DefragShouldStop()
has returned a non-zero
value and the defrag callback is about to exit without fully iterating its
data type.
This behavior is reserved to cases where late defrag is performed. Late
defrag is selected for keys that implement the free_effort
callback and
return a free_effort
value that is larger than the defrag
‘active-defrag-max-scan-fields’ configuration directive.
Smaller keys, keys that do not implement free_effort
or the global
defrag callback are not called in late-defrag mode. In those cases, a
call to this function will return REDISMODULE_ERR
.
The cursor may be used by the module to represent some progress into the module’s data type. Modules may also store additional cursor-related information locally and use the cursor as a flag that indicates when traversal of a new key begins. This is possible because the API makes a guarantee that concurrent defragmentation of multiple keys will not be performed.
RedisModule_DefragCursorGet
int RedisModule_DefragCursorGet(RedisModuleDefragCtx *ctx,
unsigned long *cursor);
Available since: 6.2.0
Fetch a cursor value that has been previously stored using RedisModule_DefragCursorSet()
.
If not called for a late defrag operation, REDISMODULE_ERR
will be returned and
the cursor should be ignored. See RedisModule_DefragCursorSet()
for more details on
defrag cursors.
RedisModule_DefragAlloc
void *RedisModule_DefragAlloc(RedisModuleDefragCtx *ctx, void *ptr);
Available since: 6.2.0
Defrag a memory allocation previously allocated by RedisModule_Alloc
, RedisModule_Calloc
, etc.
The defragmentation process involves allocating a new memory block and copying
the contents to it, like realloc()
.
If defragmentation was not necessary, NULL is returned and the operation has no other effect.
If a non-NULL value is returned, the caller should use the new pointer instead of the old one and update any reference to the old pointer, which must not be used again.
RedisModule_DefragRedisModuleString
RedisModuleString *RedisModule_DefragRedisModuleString(RedisModuleDefragCtx *ctx,
RedisModuleString *str);
Available since: 6.2.0
Defrag a RedisModuleString
previously allocated by RedisModule_Alloc
, RedisModule_Calloc
, etc.
See RedisModule_DefragAlloc()
for more information on how the defragmentation process
works.
NOTE: It is only possible to defrag strings that have a single reference.
Typically this means strings retained with RedisModule_RetainString
or RedisModule_HoldString
may not be defragmentable. One exception is command argvs which, if retained
by the module, will end up with a single reference (because the reference
on the Redis side is dropped as soon as the command callback returns).
RedisModule_GetKeyNameFromDefragCtx
const RedisModuleString *RedisModule_GetKeyNameFromDefragCtx(RedisModuleDefragCtx *ctx);
Returns the name of the key currently being processed. There is no guarantee that the key name is always available, so this may return NULL.
RedisModule_GetDbIdFromDefragCtx
int RedisModule_GetDbIdFromDefragCtx(RedisModuleDefragCtx *ctx);
Returns the database id of the key currently being processed. There is no guarantee that this info is always available, so this may return -1.
Function index
RedisModule_ACLAddLogEntry
RedisModule_ACLCheckChannelPermissions
RedisModule_ACLCheckCommandPermissions
RedisModule_ACLCheckKeyPermissions
RedisModule_AbortBlock
RedisModule_Alloc
RedisModule_AuthenticateClientWithACLUser
RedisModule_AuthenticateClientWithUser
RedisModule_AutoMemory
RedisModule_AvoidReplicaTraffic
RedisModule_BlockClient
RedisModule_BlockClientOnKeys
RedisModule_BlockedClientDisconnected
RedisModule_BlockedClientMeasureTimeEnd
RedisModule_BlockedClientMeasureTimeStart
RedisModule_Call
RedisModule_CallReplyArrayElement
RedisModule_CallReplyAttribute
RedisModule_CallReplyAttributeElement
RedisModule_CallReplyBigNumber
RedisModule_CallReplyBool
RedisModule_CallReplyDouble
RedisModule_CallReplyInteger
RedisModule_CallReplyLength
RedisModule_CallReplyMapElement
RedisModule_CallReplyProto
RedisModule_CallReplySetElement
RedisModule_CallReplyStringPtr
RedisModule_CallReplyType
RedisModule_CallReplyVerbatim
RedisModule_Calloc
RedisModule_ChannelAtPosWithFlags
RedisModule_CloseKey
RedisModule_CommandFilterArgDelete
RedisModule_CommandFilterArgGet
RedisModule_CommandFilterArgInsert
RedisModule_CommandFilterArgReplace
RedisModule_CommandFilterArgsCount
RedisModule_CreateCommand
RedisModule_CreateDataType
RedisModule_CreateDict
RedisModule_CreateModuleUser
RedisModule_CreateString
RedisModule_CreateStringFromCallReply
RedisModule_CreateStringFromDouble
RedisModule_CreateStringFromLongDouble
RedisModule_CreateStringFromLongLong
RedisModule_CreateStringFromStreamID
RedisModule_CreateStringFromString
RedisModule_CreateStringPrintf
RedisModule_CreateSubcommand
RedisModule_CreateTimer
RedisModule_DbSize
RedisModule_DeauthenticateAndCloseClient
RedisModule_DefragAlloc
RedisModule_DefragCursorGet
RedisModule_DefragCursorSet
RedisModule_DefragRedisModuleString
RedisModule_DefragShouldStop
RedisModule_DeleteKey
RedisModule_DictCompare
RedisModule_DictCompareC
RedisModule_DictDel
RedisModule_DictDelC
RedisModule_DictGet
RedisModule_DictGetC
RedisModule_DictIteratorReseek
RedisModule_DictIteratorReseekC
RedisModule_DictIteratorStart
RedisModule_DictIteratorStartC
RedisModule_DictIteratorStop
RedisModule_DictNext
RedisModule_DictNextC
RedisModule_DictPrev
RedisModule_DictPrevC
RedisModule_DictReplace
RedisModule_DictReplaceC
RedisModule_DictSet
RedisModule_DictSetC
RedisModule_DictSize
RedisModule_DigestAddLongLong
RedisModule_DigestAddStringBuffer
RedisModule_DigestEndSequence
RedisModule_EmitAOF
RedisModule_EventLoopAdd
RedisModule_EventLoopAddOneShot
RedisModule_EventLoopDel
RedisModule_ExitFromChild
RedisModule_ExportSharedAPI
RedisModule_Fork
RedisModule_Free
RedisModule_FreeCallReply
RedisModule_FreeClusterNodesList
RedisModule_FreeDict
RedisModule_FreeModuleUser
RedisModule_FreeServerInfo
RedisModule_FreeString
RedisModule_FreeThreadSafeContext
RedisModule_GetAbsExpire
RedisModule_GetBlockedClientHandle
RedisModule_GetBlockedClientPrivateData
RedisModule_GetBlockedClientReadyKey
RedisModule_GetClientCertificate
RedisModule_GetClientId
RedisModule_GetClientInfoById
RedisModule_GetClientUserNameById
RedisModule_GetClusterNodeInfo
RedisModule_GetClusterNodesList
RedisModule_GetClusterSize
RedisModule_GetCommand
RedisModule_GetCommandKeys
RedisModule_GetCommandKeysWithFlags
RedisModule_GetContextFlags
RedisModule_GetContextFlagsAll
RedisModule_GetCurrentCommandName
RedisModule_GetCurrentUserName
RedisModule_GetDbIdFromDefragCtx
RedisModule_GetDbIdFromDigest
RedisModule_GetDbIdFromIO
RedisModule_GetDbIdFromModuleKey
RedisModule_GetDbIdFromOptCtx
RedisModule_GetDetachedThreadSafeContext
RedisModule_GetExpire
RedisModule_GetKeyNameFromDefragCtx
RedisModule_GetKeyNameFromDigest
RedisModule_GetKeyNameFromIO
RedisModule_GetKeyNameFromModuleKey
RedisModule_GetKeyNameFromOptCtx
RedisModule_GetKeyspaceNotificationFlagsAll
RedisModule_GetLFU
RedisModule_GetLRU
RedisModule_GetModuleUserFromUserName
RedisModule_GetMyClusterID
RedisModule_GetNotifyKeyspaceEvents
RedisModule_GetRandomBytes
RedisModule_GetRandomHexChars
RedisModule_GetSelectedDb
RedisModule_GetServerInfo
RedisModule_GetServerVersion
RedisModule_GetSharedAPI
RedisModule_GetThreadSafeContext
RedisModule_GetTimerInfo
RedisModule_GetToDbIdFromOptCtx
RedisModule_GetToKeyNameFromOptCtx
RedisModule_GetTypeMethodVersion
RedisModule_GetUsedMemoryRatio
RedisModule_HashGet
RedisModule_HashSet
RedisModule_HoldString
RedisModule_InfoAddFieldCString
RedisModule_InfoAddFieldDouble
RedisModule_InfoAddFieldLongLong
RedisModule_InfoAddFieldString
RedisModule_InfoAddFieldULongLong
RedisModule_InfoAddSection
RedisModule_InfoBeginDictField
RedisModule_InfoEndDictField
RedisModule_IsBlockedReplyRequest
RedisModule_IsBlockedTimeoutRequest
RedisModule_IsChannelsPositionRequest
RedisModule_IsIOError
RedisModule_IsKeysPositionRequest
RedisModule_IsModuleNameBusy
RedisModule_IsSubEventSupported
RedisModule_KeyAtPos
RedisModule_KeyAtPosWithFlags
RedisModule_KeyExists
RedisModule_KeyType
RedisModule_KillForkChild
RedisModule_LatencyAddSample
RedisModule_ListDelete
RedisModule_ListGet
RedisModule_ListInsert
RedisModule_ListPop
RedisModule_ListPush
RedisModule_ListSet
RedisModule_LoadDataTypeFromString
RedisModule_LoadDataTypeFromStringEncver
RedisModule_LoadDouble
RedisModule_LoadFloat
RedisModule_LoadLongDouble
RedisModule_LoadSigned
RedisModule_LoadString
RedisModule_LoadStringBuffer
RedisModule_LoadUnsigned
RedisModule_Log
RedisModule_LogIOError
RedisModule_MallocSize
RedisModule_Milliseconds
RedisModule_ModuleTypeGetType
RedisModule_ModuleTypeGetValue
RedisModule_ModuleTypeReplaceValue
RedisModule_ModuleTypeSetValue
RedisModule_MonotonicMicroseconds
RedisModule_NotifyKeyspaceEvent
RedisModule_OpenKey
RedisModule_PoolAlloc
RedisModule_PublishMessage
RedisModule_RandomKey
RedisModule_Realloc
RedisModule_RegisterClusterMessageReceiver
RedisModule_RegisterCommandFilter
RedisModule_RegisterDefragFunc
RedisModule_RegisterInfoFunc
RedisModule_Replicate
RedisModule_ReplicateVerbatim
RedisModule_ReplySetArrayLength
RedisModule_ReplySetAttributeLength
RedisModule_ReplySetMapLength
RedisModule_ReplySetSetLength
RedisModule_ReplyWithArray
RedisModule_ReplyWithAttribute
RedisModule_ReplyWithBigNumber
RedisModule_ReplyWithBool
RedisModule_ReplyWithCString
RedisModule_ReplyWithCallReply
RedisModule_ReplyWithDouble
RedisModule_ReplyWithEmptyArray
RedisModule_ReplyWithEmptyString
RedisModule_ReplyWithError
RedisModule_ReplyWithLongDouble
RedisModule_ReplyWithLongLong
RedisModule_ReplyWithMap
RedisModule_ReplyWithNull
RedisModule_ReplyWithNullArray
RedisModule_ReplyWithSet
RedisModule_ReplyWithSimpleString
RedisModule_ReplyWithString
RedisModule_ReplyWithStringBuffer
RedisModule_ReplyWithVerbatimString
RedisModule_ReplyWithVerbatimStringType
RedisModule_ResetDataset
RedisModule_RetainString
RedisModule_SaveDataTypeToString
RedisModule_SaveDouble
RedisModule_SaveFloat
RedisModule_SaveLongDouble
RedisModule_SaveSigned
RedisModule_SaveString
RedisModule_SaveStringBuffer
RedisModule_SaveUnsigned
RedisModule_Scan
RedisModule_ScanCursorCreate
RedisModule_ScanCursorDestroy
RedisModule_ScanCursorRestart
RedisModule_ScanKey
RedisModule_SelectDb
RedisModule_SendChildHeartbeat
RedisModule_SendClusterMessage
RedisModule_ServerInfoGetField
RedisModule_ServerInfoGetFieldC
RedisModule_ServerInfoGetFieldDouble
RedisModule_ServerInfoGetFieldSigned
RedisModule_ServerInfoGetFieldUnsigned
RedisModule_SetAbsExpire
RedisModule_SetClusterFlags
RedisModule_SetCommandInfo
RedisModule_SetDisconnectCallback
RedisModule_SetExpire
RedisModule_SetLFU
RedisModule_SetLRU
RedisModule_SetModuleOptions
RedisModule_SetModuleUserACL
RedisModule_SignalKeyAsReady
RedisModule_SignalModifiedKey
RedisModule_StopTimer
RedisModule_Strdup
RedisModule_StreamAdd
RedisModule_StreamDelete
RedisModule_StreamIteratorDelete
RedisModule_StreamIteratorNextField
RedisModule_StreamIteratorNextID
RedisModule_StreamIteratorStart
RedisModule_StreamIteratorStop
RedisModule_StreamTrimByID
RedisModule_StreamTrimByLength
RedisModule_StringAppendBuffer
RedisModule_StringCompare
RedisModule_StringDMA
RedisModule_StringPtrLen
RedisModule_StringSet
RedisModule_StringToDouble
RedisModule_StringToLongDouble
RedisModule_StringToLongLong
RedisModule_StringToStreamID
RedisModule_StringTruncate
RedisModule_SubscribeToKeyspaceEvents
RedisModule_SubscribeToServerEvent
RedisModule_ThreadSafeContextLock
RedisModule_ThreadSafeContextTryLock
RedisModule_ThreadSafeContextUnlock
RedisModule_TrimStringAllocation
RedisModule_UnblockClient
RedisModule_UnlinkKey
RedisModule_UnregisterCommandFilter
RedisModule_ValueLength
RedisModule_WrongArity
RedisModule_Yield
RedisModule_ZsetAdd
RedisModule_ZsetFirstInLexRange
RedisModule_ZsetFirstInScoreRange
RedisModule_ZsetIncrby
RedisModule_ZsetLastInLexRange
RedisModule_ZsetLastInScoreRange
RedisModule_ZsetRangeCurrentElement
RedisModule_ZsetRangeEndReached
RedisModule_ZsetRangeNext
RedisModule_ZsetRangePrev
RedisModule_ZsetRangeStop
RedisModule_ZsetRem
RedisModule_ZsetScore
RedisModule__Assert
7.2 - Redis modules and blocking commands
Redis has a few blocking commands among the built-in set of commands.
One of the most used is BLPOP
(or the symmetric BRPOP
) which blocks
waiting for elements arriving in a list.
The interesting fact about blocking commands is that they do not block
the whole server, but just the client calling them. Usually the reason to
block is that we expect some external event to happen: this can be
some change in the Redis data structures like in the BLPOP
case, a
long computation happening in a thread, to receive some data from the
network, and so forth.
Redis modules have the ability to implement blocking commands as well, this documentation shows how the API works and describes a few patterns that can be used in order to model blocking commands.
NOTE: This API is currently experimental, so it can only be used if
the macro REDISMODULE_EXPERIMENTAL_API
is defined. This is required because
these calls are still not in their final stage of design, so may change
in the future, certain parts may be deprecated and so forth.
To use this part of the modules API include the modules header like that:
#define REDISMODULE_EXPERIMENTAL_API
#include "redismodule.h"
How blocking and resuming works.
Note: You may want to check the helloblock.c
example in the Redis source tree
inside the src/modules
directory, for a simple to understand example
on how the blocking API is applied.
In Redis modules, commands are implemented by callback functions that are invoked by the Redis core when the specific command is called by the user. Normally the callback terminates its execution sending some reply to the client. Using the following function instead, the function implementing the module command may request that the client is put into the blocked state:
RedisModuleBlockedClient *RedisModule_BlockClient(RedisModuleCtx *ctx, RedisModuleCmdFunc reply_callback, RedisModuleCmdFunc timeout_callback, void (*free_privdata)(void*), long long timeout_ms);
The function returns a RedisModuleBlockedClient
object, which is later
used in order to unblock the client. The arguments have the following
meaning:
ctx
is the command execution context as usually in the rest of the API.reply_callback
is the callback, having the same prototype of a normal command function, that is called when the client is unblocked in order to return a reply to the client.timeout_callback
is the callback, having the same prototype of a normal command function that is called when the client reached thems
timeout.free_privdata
is the callback that is called in order to free the private data. Private data is a pointer to some data that is passed between the API used to unblock the client, to the callback that will send the reply to the client. We’ll see how this mechanism works later in this document.ms
is the timeout in milliseconds. When the timeout is reached, the timeout callback is called and the client is automatically aborted.
Once a client is blocked, it can be unblocked with the following API:
int RedisModule_UnblockClient(RedisModuleBlockedClient *bc, void *privdata);
The function takes as argument the blocked client object returned by
the previous call to RedisModule_BlockClient()
, and unblock the client.
Immediately before the client gets unblocked, the reply_callback
function
specified when the client was blocked is called: this function will
have access to the privdata
pointer used here.
IMPORTANT: The above function is thread safe, and can be called from within a thread doing some work in order to implement the command that blocked the client.
The privdata
data will be freed automatically using the free_privdata
callback when the client is unblocked. This is useful since the reply
callback may never be called in case the client timeouts or disconnects
from the server, so it’s important that it’s up to an external function
to have the responsibility to free the data passed if needed.
To better understand how the API works, we can imagine writing a command that blocks a client for one second, and then send as reply “Hello!”.
Note: arity checks and other non important things are not implemented int his command, in order to take the example simple.
int Example_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv,
int argc)
{
RedisModuleBlockedClient *bc =
RedisModule_BlockClient(ctx,reply_func,timeout_func,NULL,0);
pthread_t tid;
pthread_create(&tid,NULL,threadmain,bc);
return REDISMODULE_OK;
}
void *threadmain(void *arg) {
RedisModuleBlockedClient *bc = arg;
sleep(1); /* Wait one second and unblock. */
RedisModule_UnblockClient(bc,NULL);
}
The above command blocks the client ASAP, spawning a thread that will wait a second and will unblock the client. Let’s check the reply and timeout callbacks, which are in our case very similar, since they just reply the client with a different reply type.
int reply_func(RedisModuleCtx *ctx, RedisModuleString **argv,
int argc)
{
return RedisModule_ReplyWithSimpleString(ctx,"Hello!");
}
int timeout_func(RedisModuleCtx *ctx, RedisModuleString **argv,
int argc)
{
return RedisModule_ReplyWithNull(ctx);
}
The reply callback just sends the “Hello!” string to the client. The important bit here is that the reply callback is called when the client is unblocked from the thread.
The timeout command returns NULL
, as it often happens with actual
Redis blocking commands timing out.
Passing reply data when unblocking
The above example is simple to understand but lacks an important real world aspect of an actual blocking command implementation: often the reply function will need to know what to reply to the client, and this information is often provided as the client is unblocked.
We could modify the above example so that the thread generates a random number after waiting one second. You can think at it as an actually expansive operation of some kind. Then this random number can be passed to the reply function so that we return it to the command caller. In order to make this working, we modify the functions as follow:
void *threadmain(void *arg) {
RedisModuleBlockedClient *bc = arg;
sleep(1); /* Wait one second and unblock. */
long *mynumber = RedisModule_Alloc(sizeof(long));
*mynumber = rand();
RedisModule_UnblockClient(bc,mynumber);
}
As you can see, now the unblocking call is passing some private data,
that is the mynumber
pointer, to the reply callback. In order to
obtain this private data, the reply callback will use the following
function:
void *RedisModule_GetBlockedClientPrivateData(RedisModuleCtx *ctx);
So our reply callback is modified like that:
int reply_func(RedisModuleCtx *ctx, RedisModuleString **argv,
int argc)
{
long *mynumber = RedisModule_GetBlockedClientPrivateData(ctx);
/* IMPORTANT: don't free mynumber here, but in the
* free privdata callback. */
return RedisModule_ReplyWithLongLong(ctx,mynumber);
}
Note that we also need to pass a free_privdata
function when blocking
the client with RedisModule_BlockClient()
, since the allocated
long value must be freed. Our callback will look like the following:
void free_privdata(void *privdata) {
RedisModule_Free(privdata);
}
NOTE: It is important to stress that the private data is best freed in the
free_privdata
callback because the reply function may not be called
if the client disconnects or timeout.
Also note that the private data is also accessible from the timeout
callback, always using the GetBlockedClientPrivateData()
API.
Aborting the blocking of a client
One problem that sometimes arises is that we need to allocate resources
in order to implement the non blocking command. So we block the client,
then, for example, try to create a thread, but the thread creation function
returns an error. What to do in such a condition in order to recover? We
don’t want to take the client blocked, nor we want to call UnblockClient()
because this will trigger the reply callback to be called.
In this case the best thing to do is to use the following function:
int RedisModule_AbortBlock(RedisModuleBlockedClient *bc);
Practically this is how to use it:
int Example_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv,
int argc)
{
RedisModuleBlockedClient *bc =
RedisModule_BlockClient(ctx,reply_func,timeout_func,NULL,0);
pthread_t tid;
if (pthread_create(&tid,NULL,threadmain,bc) != 0) {
RedisModule_AbortBlock(bc);
RedisModule_ReplyWithError(ctx,"Sorry can't create a thread");
}
return REDISMODULE_OK;
}
The client will be unblocked but the reply callback will not be called.
Implementing the command, reply and timeout callback using a single function
The following functions can be used in order to implement the reply and callback with the same function that implements the primary command function:
int RedisModule_IsBlockedReplyRequest(RedisModuleCtx *ctx);
int RedisModule_IsBlockedTimeoutRequest(RedisModuleCtx *ctx);
So I could rewrite the example command without using a separated reply and timeout callback:
int Example_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv,
int argc)
{
if (RedisModule_IsBlockedReplyRequest(ctx)) {
long *mynumber = RedisModule_GetBlockedClientPrivateData(ctx);
return RedisModule_ReplyWithLongLong(ctx,mynumber);
} else if (RedisModule_IsBlockedTimeoutRequest) {
return RedisModule_ReplyWithNull(ctx);
}
RedisModuleBlockedClient *bc =
RedisModule_BlockClient(ctx,reply_func,timeout_func,NULL,0);
pthread_t tid;
if (pthread_create(&tid,NULL,threadmain,bc) != 0) {
RedisModule_AbortBlock(bc);
RedisModule_ReplyWithError(ctx,"Sorry can't create a thread");
}
return REDISMODULE_OK;
}
Functionally is the same but there are people that will prefer the less verbose implementation that concentrates most of the command logic in a single function.
Working on copies of data inside a thread
An interesting pattern in order to work with threads implementing the slow part of a command, is to work with a copy of the data, so that while some operation is performed in a key, the user continues to see the old version. However when the thread terminated its work, the representations are swapped and the new, processed version, is used.
An example of this approach is the Neural Redis module where neural networks are trained in different threads while the user can still execute and inspect their older versions.
Future work
An API is work in progress right now in order to allow Redis modules APIs to be called in a safe way from threads, so that the threaded command can access the data space and do incremental operations.
There is no ETA for this feature but it may appear in the course of the Redis 4.0 release at some point.
7.3 - Modules API for native types
Redis modules can access Redis built-in data structures both at high level, by calling Redis commands, and at low level, by manipulating the data structures directly.
By using these capabilities in order to build new abstractions on top of existing Redis data structures, or by using strings DMA in order to encode modules data structures into Redis strings, it is possible to create modules that feel like they are exporting new data types. However, for more complex problems, this is not enough, and the implementation of new data structures inside the module is needed.
We call the ability of Redis modules to implement new data structures that
feel like native Redis ones native types support. This document describes
the API exported by the Redis modules system in order to create new data
structures and handle the serialization in RDB files, the rewriting process
in AOF, the type reporting via the TYPE
command, and so forth.
Overview of native types
A module exporting a native type is composed of the following main parts:
- The implementation of some kind of new data structure and of commands operating on the new data structure.
- A set of callbacks that handle: RDB saving, RDB loading, AOF rewriting, releasing of a value associated with a key, calculation of a value digest (hash) to be used with the
DEBUG DIGEST
command. - A 9 characters name that is unique to each module native data type.
- An encoding version, used to persist into RDB files a module-specific data version, so that a module will be able to load older representations from RDB files.
While to handle RDB loading, saving and AOF rewriting may look complex as a first glance, the modules API provide very high level function for handling all this, without requiring the user to handle read/write errors, so in practical terms, writing a new data structure for Redis is a simple task.
A very easy to understand but complete example of native type implementation
is available inside the Redis distribution in the /modules/hellotype.c
file.
The reader is encouraged to read the documentation by looking at this example
implementation to see how things are applied in the practice.
Registering a new data type
In order to register a new native type into the Redis core, the module needs to declare a global variable that will hold a reference to the data type. The API to register the data type will return a data type reference that will be stored in the global variable.
static RedisModuleType *MyType;
#define MYTYPE_ENCODING_VERSION 0
int RedisModule_OnLoad(RedisModuleCtx *ctx) {
RedisModuleTypeMethods tm = {
.version = REDISMODULE_TYPE_METHOD_VERSION,
.rdb_load = MyTypeRDBLoad,
.rdb_save = MyTypeRDBSave,
.aof_rewrite = MyTypeAOFRewrite,
.free = MyTypeFree
};
MyType = RedisModule_CreateDataType(ctx, "MyType-AZ",
MYTYPE_ENCODING_VERSION, &tm);
if (MyType == NULL) return REDISMODULE_ERR;
}
As you can see from the example above, a single API call is needed in order to
register the new type. However a number of function pointers are passed as
arguments. Certain are optionals while some are mandatory. The above set
of methods must be passed, while .digest
and .mem_usage
are optional
and are currently not actually supported by the modules internals, so for
now you can just ignore them.
The ctx
argument is the context that we receive in the OnLoad
function.
The type name
is a 9 character name in the character set that includes
from A-Z
, a-z
, 0-9
, plus the underscore _
and minus -
characters.
Note that this name must be unique for each data type in the Redis ecosystem, so be creative, use both lower-case and upper case if it makes sense, and try to use the convention of mixing the type name with the name of the author of the module, to create a 9 character unique name.
NOTE: It is very important that the name is exactly 9 chars or the registration of the type will fail. Read more to understand why.
For example if I’m building a b-tree data structure and my name is antirez I’ll call my type btree1-az. The name, converted to a 64 bit integer, is stored inside the RDB file when saving the type, and will be used when the RDB data is loaded in order to resolve what module can load the data. If Redis finds no matching module, the integer is converted back to a name in order to provide some clue to the user about what module is missing in order to load the data.
The type name is also used as a reply for the TYPE
command when called
with a key holding the registered type.
The encver
argument is the encoding version used by the module to store data
inside the RDB file. For example I can start with an encoding version of 0,
but later when I release version 2.0 of my module, I can switch encoding to
something better. The new module will register with an encoding version of 1,
so when it saves new RDB files, the new version will be stored on disk. However
when loading RDB files, the module rdb_load
method will be called even if
there is data found for a different encoding version (and the encoding version
is passed as argument to rdb_load
), so that the module can still load old
RDB files.
The last argument is a structure used in order to pass the type methods to the
registration function: rdb_load
, rdb_save
, aof_rewrite
, digest
and
free
and mem_usage
are all callbacks with the following prototypes and uses:
typedef void *(*RedisModuleTypeLoadFunc)(RedisModuleIO *rdb, int encver);
typedef void (*RedisModuleTypeSaveFunc)(RedisModuleIO *rdb, void *value);
typedef void (*RedisModuleTypeRewriteFunc)(RedisModuleIO *aof, RedisModuleString *key, void *value);
typedef size_t (*RedisModuleTypeMemUsageFunc)(void *value);
typedef void (*RedisModuleTypeDigestFunc)(RedisModuleDigest *digest, void *value);
typedef void (*RedisModuleTypeFreeFunc)(void *value);
rdb_load
is called when loading data from the RDB file. It loads data in the same format asrdb_save
produces.rdb_save
is called when saving data to the RDB file.aof_rewrite
is called when the AOF is being rewritten, and the module needs to tell Redis what is the sequence of commands to recreate the content of a given key.digest
is called whenDEBUG DIGEST
is executed and a key holding this module type is found. Currently this is not yet implemented so the function ca be left empty.mem_usage
is called when theMEMORY
command asks for the total memory consumed by a specific key, and is used in order to get the amount of bytes used by the module value.free
is called when a key with the module native type is deleted viaDEL
or in any other mean, in order to let the module reclaim the memory associated with such a value.
Ok, but why modules types require a 9 characters name?
Oh, I understand you need to understand this, so here is a very specific explanation.
When Redis persists to RDB files, modules specific data types require to be persisted as well. Now RDB files are sequences of key-value pairs like the following:
[1 byte type] [key] [a type specific value]
The 1 byte type identifies strings, lists, sets, and so forth. In the case
of modules data, it is set to a special value of module data
, but of
course this is not enough, we need the information needed to link a specific
value with a specific module type that is able to load and handle it.
So when we save a type specific value
about a module, we prefix it with
a 64 bit integer. 64 bits is large enough to store the informations needed
in order to lookup the module that can handle that specific type, but is
short enough that we can prefix each module value we store inside the RDB
without making the final RDB file too big. At the same time, this solution
of prefixing the value with a 64 bit signature does not require to do
strange things like defining in the RDB header a list of modules specific
types. Everything is pretty simple.
So, what you can store in 64 bits in order to identify a given module in a reliable way? Well if you build a character set of 64 symbols, you can easily store 9 characters of 6 bits, and you are left with 10 bits, that are used in order to store the encoding version of the type, so that the same type can evolve in the future and provide a different and more efficient or updated serialization format for RDB files.
So the 64 bit prefix stored before each module value is like the following:
6|6|6|6|6|6|6|6|6|10
The first 9 elements are 6-bits characters, the final 10 bits is the encoding version.
When the RDB file is loaded back, it reads the 64 bit value, masks the final 10 bits, and searches for a matching module in the modules types cache. When a matching one is found, the method to load the RDB file value is called with the 10 bits encoding version as argument, so that the module knows what version of the data layout to load, if it can support multiple versions.
Now the interesting thing about all this is that, if instead the module type cannot be resolved, since there is no loaded module having this signature, we can convert back the 64 bit value into a 9 characters name, and print an error to the user that includes the module type name! So that she or he immediately realizes what’s wrong.
Setting and getting keys
After registering our new data type in the RedisModule_OnLoad()
function,
we also need to be able to set Redis keys having as value our native type.
This normally happens in the context of commands that write data to a key. The native types API allow to set and get keys to module native data types, and to test if a given key is already associated to a value of a specific data type.
The API uses the normal modules RedisModule_OpenKey()
low level key access
interface in order to deal with this. This is an example of setting a
native type private data structure to a Redis key:
RedisModuleKey *key = RedisModule_OpenKey(ctx,keyname,REDISMODULE_WRITE);
struct some_private_struct *data = createMyDataStructure();
RedisModule_ModuleTypeSetValue(key,MyType,data);
The function RedisModule_ModuleTypeSetValue()
is used with a key handle open
for writing, and gets three arguments: the key handle, the reference to the
native type, as obtained during the type registration, and finally a void*
pointer that contains the private data implementing the module native type.
Note that Redis has no clues at all about what your data contains. It will just call the callbacks you provided during the method registration in order to perform operations on the type.
Similarly we can retrieve the private data from a key using this function:
struct some_private_struct *data;
data = RedisModule_ModuleTypeGetValue(key);
We can also test for a key to have our native type as value:
if (RedisModule_ModuleTypeGetType(key) == MyType) {
/* ... do something ... */
}
However for the calls to do the right thing, we need to check if the key is empty, if it contains a value of the right kind, and so forth. So the idiomatic code to implement a command writing to our native type is along these lines:
RedisModuleKey *key = RedisModule_OpenKey(ctx,argv[1],
REDISMODULE_READ|REDISMODULE_WRITE);
int type = RedisModule_KeyType(key);
if (type != REDISMODULE_KEYTYPE_EMPTY &&
RedisModule_ModuleTypeGetType(key) != MyType)
{
return RedisModule_ReplyWithError(ctx,REDISMODULE_ERRORMSG_WRONGTYPE);
}
Then if we successfully verified the key is not of the wrong type, and we are going to write to it, we usually want to create a new data structure if the key is empty, or retrieve the reference to the value associated to the key if there is already one:
/* Create an empty value object if the key is currently empty. */
struct some_private_struct *data;
if (type == REDISMODULE_KEYTYPE_EMPTY) {
data = createMyDataStructure();
RedisModule_ModuleTypeSetValue(key,MyTyke,data);
} else {
data = RedisModule_ModuleTypeGetValue(key);
}
/* Do something with 'data'... */
Free method
As already mentioned, when Redis needs to free a key holding a native type
value, it needs help from the module in order to release the memory. This
is the reason why we pass a free
callback during the type registration:
typedef void (*RedisModuleTypeFreeFunc)(void *value);
A trivial implementation of the free method can be something like this, assuming our data structure is composed of a single allocation:
void MyTypeFreeCallback(void *value) {
RedisModule_Free(value);
}
However a more real world one will call some function that performs a more complex memory reclaiming, by casting the void pointer to some structure and freeing all the resources composing the value.
RDB load and save methods
The RDB saving and loading callbacks need to create (and load back) a representation of the data type on disk. Redis offers an high level API that can automatically store inside the RDB file the following types:
- Unsigned 64 bit integers.
- Signed 64 bit integers.
- Doubles.
- Strings.
It is up to the module to find a viable representation using the above base types. However note that while the integer and double values are stored and loaded in an architecture and endianness agnostic way, if you use the raw string saving API to, for example, save a structure on disk, you have to care those details yourself.
This is the list of functions performing RDB saving and loading:
void RedisModule_SaveUnsigned(RedisModuleIO *io, uint64_t value);
uint64_t RedisModule_LoadUnsigned(RedisModuleIO *io);
void RedisModule_SaveSigned(RedisModuleIO *io, int64_t value);
int64_t RedisModule_LoadSigned(RedisModuleIO *io);
void RedisModule_SaveString(RedisModuleIO *io, RedisModuleString *s);
void RedisModule_SaveStringBuffer(RedisModuleIO *io, const char *str, size_t len);
RedisModuleString *RedisModule_LoadString(RedisModuleIO *io);
char *RedisModule_LoadStringBuffer(RedisModuleIO *io, size_t *lenptr);
void RedisModule_SaveDouble(RedisModuleIO *io, double value);
double RedisModule_LoadDouble(RedisModuleIO *io);
The functions don’t require any error checking from the module, that can always assume calls succeed.
As an example, imagine I’ve a native type that implements an array of double values, with the following structure:
struct double_array {
size_t count;
double *values;
};
My rdb_save
method may look like the following:
void DoubleArrayRDBSave(RedisModuleIO *io, void *ptr) {
struct dobule_array *da = ptr;
RedisModule_SaveUnsigned(io,da->count);
for (size_t j = 0; j < da->count; j++)
RedisModule_SaveDouble(io,da->values[j]);
}
What we did was to store the number of elements followed by each double
value. So when later we’ll have to load the structure in the rdb_load
method we’ll do something like this:
void *DoubleArrayRDBLoad(RedisModuleIO *io, int encver) {
if (encver != DOUBLE_ARRAY_ENC_VER) {
/* We should actually log an error here, or try to implement
the ability to load older versions of our data structure. */
return NULL;
}
struct double_array *da;
da = RedisModule_Alloc(sizeof(*da));
da->count = RedisModule_LoadUnsigned(io);
da->values = RedisModule_Alloc(da->count * sizeof(double));
for (size_t j = 0; j < da->count; j++)
da->values[j] = RedisModule_LoadDouble(io);
return da;
}
The load callback just reconstruct back the data structure from the data we stored in the RDB file.
Note that while there is no error handling on the API that writes and reads from disk, still the load callback can return NULL on errors in case what it reads does not look correct. Redis will just panic in that case.
AOF rewriting
void RedisModule_EmitAOF(RedisModuleIO *io, const char *cmdname, const char *fmt, ...);
Handling multiple encodings
WORK IN PROGRESS
Allocating memory
Modules data types should try to use RedisModule_Alloc()
functions family
in order to allocate, reallocate and release heap memory used to implement the native data structures (see the other Redis Modules documentation for detailed information).
This is not just useful in order for Redis to be able to account for the memory used by the module, but there are also more advantages:
- Redis uses the
jemalloc
allocator, that often prevents fragmentation problems that could be caused by using the libc allocator. - When loading strings from the RDB file, the native types API is able to return strings allocated directly with
RedisModule_Alloc()
, so that the module can directly link this memory into the data structure representation, avoiding an useless copy of the data.
Even if you are using external libraries implementing your data structures, the
allocation functions provided by the module API is exactly compatible with
malloc()
, realloc()
, free()
and strdup()
, so converting the libraries
in order to use these functions should be trivial.
In case you have an external library that uses libc malloc()
, and you want
to avoid replacing manually all the calls with the Redis Modules API calls,
an approach could be to use simple macros in order to replace the libc calls
with the Redis API calls. Something like this could work:
#define malloc RedisModule_Alloc
#define realloc RedisModule_Realloc
#define free RedisModule_Free
#define strdup RedisModule_Strdup
However take in mind that mixing libc calls with Redis API calls will result
into troubles and crashes, so if you replace calls using macros, you need to
make sure that all the calls are correctly replaced, and that the code with
the substituted calls will never, for example, attempt to call
RedisModule_Free()
with a pointer allocated using libc malloc()
.
8 - RESP protocol spec
Redis clients use a protocol called RESP (REdis Serialization Protocol) to communicate with the Redis server. While the protocol was designed specifically for Redis, it can be used for other client-server software projects.
RESP is a compromise between the following things:
- Simple to implement.
- Fast to parse.
- Human readable.
RESP can serialize different data types like integers, strings, and arrays. There is also a specific type for errors. Requests are sent from the client to the Redis server as arrays of strings that represent the arguments of the command to execute. Redis replies with a command-specific data type.
RESP is binary-safe and does not require processing of bulk data transferred from one process to another because it uses prefixed-length to transfer bulk data.
Note: the protocol outlined here is only used for client-server communication. Redis Cluster uses a different binary protocol in order to exchange messages between nodes.
Network layer
A client connects to a Redis server by creating a TCP connection to the port 6379.
While RESP is technically non-TCP specific, the protocol is only used with TCP connections (or equivalent stream-oriented connections like Unix sockets) in the context of Redis.
Request-Response model
Redis accepts commands composed of different arguments. Once a command is received, it is processed and a reply is sent back to the client.
This is the simplest model possible; however, there are two exceptions:
- Redis supports pipelining (covered later in this document). So it is possible for clients to send multiple commands at once and wait for replies later.
- When a Redis client subscribes to a Pub/Sub channel, the protocol changes semantics and becomes a push protocol. The client no longer requires sending commands because the server will automatically send new messages to the client (for the channels the client is subscribed to) as soon as they are received.
Excluding these two exceptions, the Redis protocol is a simple request-response protocol.
RESP protocol description
The RESP protocol was introduced in Redis 1.2, but it became the standard way for talking with the Redis server in Redis 2.0. This is the protocol you should implement in your Redis client.
RESP is actually a serialization protocol that supports the following data types: Simple Strings, Errors, Integers, Bulk Strings, and Arrays.
Redis uses RESP as a request-response protocol in the following way:
- Clients send commands to a Redis server as a RESP Array of Bulk Strings.
- The server replies with one of the RESP types according to the command implementation.
In RESP, the first byte determines the data type:
- For Simple Strings, the first byte of the reply is “+”
- For Errors, the first byte of the reply is “-”
- For Integers, the first byte of the reply is “:”
- For Bulk Strings, the first byte of the reply is “$”
- For Arrays, the first byte of the reply is “
*
”
RESP can represent a Null value using a special variation of Bulk Strings or Array as specified later.
In RESP, different parts of the protocol are always terminated with “\r\n” (CRLF).
RESP Simple Strings
Simple Strings are encoded as follows: a plus character, followed by a string that cannot contain a CR or LF character (no newlines are allowed), and terminated by CRLF (that is “\r\n”).
Simple Strings are used to transmit non binary-safe strings with minimal overhead. For example, many Redis commands reply with just “OK” on success. The RESP Simple String is encoded with the following 5 bytes:
"+OK\r\n"
In order to send binary-safe strings, use RESP Bulk Strings instead.
When Redis replies with a Simple String, a client library should respond with a string composed of the first character after the ‘+’ up to the end of the string, excluding the final CRLF bytes.
RESP Errors
RESP has a specific data type for errors. They are similar to RESP Simple Strings, but the first character is a minus ‘-’ character instead of a plus. The real difference between Simple Strings and Errors in RESP is that clients treat errors as exceptions, and the string that composes the Error type is the error message itself.
The basic format is:
"-Error message\r\n"
Error replies are only sent when something goes wrong, for instance if you try to perform an operation against the wrong data type, or if the command does not exist. The client should raise an exception when it receives an Error reply.
The following are examples of error replies:
-ERR unknown command 'helloworld'
-WRONGTYPE Operation against a key holding the wrong kind of value
The first word after the “-”, up to the first space or newline, represents the kind of error returned. This is just a convention used by Redis and is not part of the RESP Error format.
For example, ERR
is the generic error, while WRONGTYPE
is a more specific
error that implies that the client tried to perform an operation against the
wrong data type. This is called an Error Prefix and is a way to allow
the client to understand the kind of error returned by the server without checking the exact error message.
A client implementation may return different types of exceptions for different errors or provide a generic way to trap errors by directly providing the error name to the caller as a string.
However, such a feature should not be considered vital as it is rarely useful, and a limited client implementation may simply return a generic error condition, such as false
.
RESP Integers
This type is just a CRLF-terminated string that represents an integer, prefixed by a “:” byte. For example, “:0\r\n” and “:1000\r\n” are integer replies.
Many Redis commands return RESP Integers, like INCR
, LLEN
, and LASTSAVE
.
There is no special meaning for the returned integer. It is just an
incremental number for INCR
, a UNIX time for LASTSAVE
, and so forth. However,
the returned integer is guaranteed to be in the range of a signed 64-bit integer.
Integer replies are also used in order to return true or false.
For instance, commands like EXISTS
or SISMEMBER
will return 1 for true
and 0 for false.
Other commands like SADD
, SREM
, and SETNX
will return 1 if the operation
was actually performed and 0 otherwise.
The following commands will reply with an integer: SETNX
, DEL
,
EXISTS
, INCR
, INCRBY
, DECR
, DECRBY
, DBSIZE
, LASTSAVE
,
RENAMENX
, MOVE
, LLEN
, SADD
, SREM
, SISMEMBER
, SCARD
.
RESP Bulk Strings
Bulk Strings are used in order to represent a single binary-safe string up to 512 MB in length.
Bulk Strings are encoded in the following way:
- A “$” byte followed by the number of bytes composing the string (a prefixed length), terminated by CRLF.
- The actual string data.
- A final CRLF.
So the string “hello” is encoded as follows:
"$6\r\nhello\r\n"
An empty string is encoded as:
"$0\r\n\r\n"
RESP Bulk Strings can also be used in order to signal non-existence of a value using a special format to represent a Null value. In this format, the length is -1, and there is no data. Null is represented as:
"$-1\r\n"
This is called a Null Bulk String.
The client library API should not return an empty string, but a nil object, when the server replies with a Null Bulk String. For example, a Ruby library should return ‘nil’ while a C library should return NULL (or set a special flag in the reply object).
RESP Arrays
Clients send commands to the Redis server using RESP Arrays. Similarly,
certain Redis commands, that return collections of elements to the client,
use RESP Arrays as their replies. An example is the LRANGE
command that
returns elements of a list.
RESP Arrays are sent using the following format:
- A
*
character as the first byte, followed by the number of elements in the array as a decimal number, followed by CRLF. - An additional RESP type for every element of the Array.
So an empty Array is just the following:
"*0\r\n"
While an array of two RESP Bulk Strings “hello” and “world” is encoded as:
"*2\r\n$3\r\nhello\r\n$3\r\nworld\r\n"
As you can see after the *<count>CRLF
part prefixing the array, the other
data types composing the array are just concatenated one after the other.
For example, an Array of three integers is encoded as follows:
"*3\r\n:1\r\n:2\r\n:3\r\n"
Arrays can contain mixed types, so it’s not necessary for the elements to be of the same type. For instance, a list of four integers and a bulk string can be encoded as follows:
*5\r\n
:1\r\n
:2\r\n
:3\r\n
:4\r\n
$6\r\n
hello\r\n
(The reply was split into multiple lines for clarity).
The first line the server sent is *5\r\n
in order to specify that five
replies will follow. Then every reply constituting the items of the
Multi Bulk reply are transmitted.
Null Arrays exist as well and are an alternative way to specify a Null value (usually the Null Bulk String is used, but for historical reasons we have two formats).
For instance, when the BLPOP
command times out, it returns a Null Array
that has a count of -1
as in the following example:
"*-1\r\n"
A client library API should return a null object and not an empty Array when
Redis replies with a Null Array. This is necessary to distinguish
between an empty list and a different condition (for instance the timeout
condition of the BLPOP
command).
Nested arrays are possible in RESP. For example a nested array of two arrays is encoded as follows:
*2\r\n
*3\r\n
:1\r\n
:2\r\n
:3\r\n
*2\r\n
+Hello\r\n
-World\r\n
(The format was split into multiple lines to make it easier to read).
The above RESP data type encodes a two-element Array consisting of an Array that contains three Integers (1, 2, 3) and an array of a Simple String and an Error.
Null elements in Arrays
Single elements of an Array may be Null. This is used in Redis replies to signal that these elements are missing and not empty strings. This can happen with the SORT command when used with the GET pattern option if the specified key is missing. Example of an Array reply containing a Null element:
*3\r\n
$3\r\n
hello\r\n
$-1\r\n
$3\r\n
world\r\n
The second element is a Null. The client library should return something like this:
["hello",nil,"world"]
Note that this is not an exception to what was said in the previous sections, but an example to further specify the protocol.
Send commands to a Redis server
Now that you are familiar with the RESP serialization format, you can use it to help write a Redis client library. We can further specify how the interaction between the client and the server works:
- A client sends the Redis server a RESP Array consisting of only Bulk Strings.
- A Redis server replies to clients, sending any valid RESP data type as a reply.
So for example a typical interaction could be the following.
The client sends the command LLEN mylist in order to get the length of the list stored at key mylist. Then the server replies with an Integer reply as in the following example (C: is the client, S: the server).
C: *2\r\n
C: $4\r\n
C: LLEN\r\n
C: $6\r\n
C: mylist\r\n
S: :48293\r\n
As usual, we separate different parts of the protocol with newlines for simplicity, but the actual interaction is the client sending *2\r\n$4\r\nLLEN\r\n$6\r\nmylist\r\n
as a whole.
Multiple commands and pipelining
A client can use the same connection in order to issue multiple commands. Pipelining is supported so multiple commands can be sent with a single write operation by the client, without the need to read the server reply of the previous command before issuing the next one. All the replies can be read at the end.
For more information, see Pipelining.
Inline commands
Sometimes you may need to send a command
to the Redis server but only have telnet
available. While the Redis protocol is simple to implement, it is
not ideal to use in interactive sessions, and redis-cli
may not always be
available. For this reason, Redis also accepts commands in the inline command format.
The following is an example of a server/client chat using an inline command (the server chat starts with S:, the client chat with C:)
C: PING
S: +PONG
The following is an example of an inline command that returns an integer:
C: EXISTS somekey
S: :0
Basically, you write space-separated arguments in a telnet session.
Since no command starts with *
that is instead used in the unified request
protocol, Redis is able to detect this condition and parse your command.
High performance parser for the Redis protocol
While the Redis protocol is human readable and easy to implement, it can be implemented with a performance similar to that of a binary protocol.
RESP uses prefixed lengths to transfer bulk data, so there is never a need to scan the payload for special characters, like with JSON, nor to quote the payload that needs to be sent to the server.
The Bulk and Multi Bulk lengths can be processed with code that performs a single operation per character while at the same time scanning for the CR character, like the following C code:
#include <stdio.h>
int main(void) {
unsigned char *p = "$123\r\n";
int len = 0;
p++;
while(*p != '\r') {
len = (len*10)+(*p - '0');
p++;
}
/* Now p points at '\r', and the len is in bulk_len. */
printf("%d\n", len);
return 0;
}
After the first CR is identified, it can be skipped along with the following LF without any processing. Then the bulk data can be read using a single read operation that does not inspect the payload in any way. Finally, the remaining CR and LF characters are discarded without any processing.
While comparable in performance to a binary protocol, the Redis protocol is significantly simpler to implement in most high-level languages, reducing the number of bugs in client software.
9 - Redis signal handling
This document provides information about how Redis reacts to different POSIX signals such as SIGTERM
and SIGSEGV
.
The information in this document only applies to Redis version 2.6 or greater.
SIGTERM and SIGINT
The SIGTERM
and SIGINT
signals tell Redis to shut down gracefully. When the server receives this signal,
it does not immediately exit. Instead, it schedules
a shutdown similar to the one performed by the SHUTDOWN
command. The scheduled shutdown starts as soon as possible, specifically as long as the
current command in execution terminates (if any), with a possible additional
delay of 0.1 seconds or less.
If the server is blocked by a long-running Lua script,
kill the script with SCRIPT KILL
if possible. The scheduled shutdown will
run just after the script is killed or terminates spontaneously.
This shutdown process includes the following actions:
- If there are any replicas lagging behind in replication:
- Pause clients attempting to write with
CLIENT PAUSE
and theWRITE
option. - Wait up to the configured
shutdown-timeout
(default 10 seconds) for replicas to catch up with the master’s replication offset.
- Pause clients attempting to write with
- If a background child is saving the RDB file or performing an AOF rewrite, the child process is killed.
- If the AOF is active, Redis calls the
fsync
system call on the AOF file descriptor to flush the buffers on disk. - If Redis is configured to persist on disk using RDB files, a synchronous (blocking) save is performed. Since the save is synchronous, it doesn’t use any additional memory.
- If the server is daemonized, the PID file is removed.
- If the Unix domain socket is enabled, it gets removed.
- The server exits with an exit code of zero.
IF the RDB file can’t be saved, the shutdown fails, and the server continues to run in order to ensure no data loss.
Likewise, if the user just turned on AOF, and the server triggered the first AOF rewrite in order to create the initial AOF file but this file can’t be saved, the shutdown fails and the server continues to run.
Since Redis 2.6.11, no further attempt to shut down will be made unless a new SIGTERM
is received or the SHUTDOWN
command is issued.
Since Redis 7.0, the server waits for lagging replicas up to a configurable shutdown-timeout
, 10 seconds by default, before shutting down.
This provides a best effort to minimize the risk of data loss in a situation where no save points are configured and AOF is deactivated.
Before version 7.0, shutting down a heavily loaded master node in a diskless setup was more likely to result in data loss.
To minimize the risk of data loss in such setups, trigger a manual FAILOVER
(or CLUSTER FAILOVER
) to demote the master to a replica and promote one of the replicas to a new master before shutting down a master node.
SIGSEGV, SIGBUS, SIGFPE and SIGILL
The following signals are handled as a Redis crash:
- SIGSEGV
- SIGBUS
- SIGFPE
- SIGILL
Once one of these signals is trapped, Redis stops any current operation and performs the following actions:
- Adds a bug report to the log file. This includes a stack trace, dump of registers, and information about the state of clients.
- Since Redis 2.8, a fast memory test is performed as a first check of the reliability of the crashing system.
- If the server was daemonized, the PID file is removed.
- Finally the server unregisters its own signal handler for the received signal and resends the same signal to itself to make sure that the default action is performed, such as dumping the core on the file system.
What happens when a child process gets killed
When the child performing the Append Only File rewrite gets killed by a signal, Redis handles this as an error and discards the (probably partial or corrupted) AOF file. It will attempt the rewrite again later.
When the child performing an RDB save is killed, Redis handles the condition as a more severe error. While the failure of an AOF file rewrite can cause AOF file enlargement, failed RDB file creation reduces durability.
As a result of the child producing the RDB file being killed by a signal, or when the child exits with an error (non zero exit code), Redis enters a special error condition where no further write command is accepted.
- Redis will continue to reply to read commands.
- Redis will reply to all write commands with a
MISCONFIG
error.
This error condition will persist until it becomes possible to create an RDB file successfully.
Kill the RDB file without errors
Sometimes the user may want to kill the RDB-saving child process without
generating an error. Since Redis version 2.6.10, this can be done using the signal SIGUSR1
. This signal is handled in a special way:
it kills the child process like any other signal, but the parent process will
not detect this as a critical error and will continue to serve write
requests.
10 - Sentinel client spec
Redis Sentinel is a monitoring solution for Redis instances that handles automatic failover of Redis masters and service discovery (who is the current master for a given group of instances?). Since Sentinel is both responsible for reconfiguring instances during failovers, and providing configurations to clients connecting to Redis masters or replicas, clients are required to have explicit support for Redis Sentinel.
This document is targeted at Redis clients developers that want to support Sentinel in their clients implementation with the following goals:
- Automatic configuration of clients via Sentinel.
- Improved safety of Redis Sentinel automatic failover.
For details about how Redis Sentinel works, please check the Redis Documentation, as this document only contains information needed for Redis client developers, and it is expected that readers are familiar with the way Redis Sentinel works.
Redis service discovery via Sentinel
Redis Sentinel identifies every master with a name like “stats” or “cache”. Every name actually identifies a group of instances, composed of a master and a variable number of replicas.
The address of the Redis master that is used for a specific purpose inside a network may change after events like an automatic failover, a manually triggered failover (for instance in order to upgrade a Redis instance), and other reasons.
Normally Redis clients have some kind of hard-coded configuration that specifies the address of a Redis master instance within a network as IP address and port number. However if the master address changes, manual intervention in every client is needed.
A Redis client supporting Sentinel can automatically discover the address of a Redis master from the master name using Redis Sentinel. So instead of a hard coded IP address and port, a client supporting Sentinel should optionally be able to take as input:
- A list of ip:port pairs pointing to known Sentinel instances.
- The name of the service, like “cache” or “timelines”.
This is the procedure a client should follow in order to obtain the master address starting from the list of Sentinels and the service name.
Step 1: connecting to the first Sentinel
The client should iterate the list of Sentinel addresses. For every address it should try to connect to the Sentinel, using a short timeout (in the order of a few hundreds of milliseconds). On errors or timeouts the next Sentinel address should be tried.
If all the Sentinel addresses were tried without success, an error should be returned to the client.
The first Sentinel replying to the client request should be put at the start of the list, so that at the next reconnection, we’ll try first the Sentinel that was reachable in the previous connection attempt, minimizing latency.
Step 2: ask for master address
Once a connection with a Sentinel is established, the client should retry to execute the following command on the Sentinel:
SENTINEL get-master-addr-by-name master-name
Where master-name should be replaced with the actual service name specified by the user.
The result from this call can be one of the following two replies:
- An ip:port pair.
- A null reply. This means Sentinel does not know this master.
If an ip:port pair is received, this address should be used to connect to the Redis master. Otherwise if a null reply is received, the client should try the next Sentinel in the list.
Step 3: call the ROLE command in the target instance
Once the client discovered the address of the master instance, it should
attempt a connection with the master, and call the ROLE
command in order
to verify the role of the instance is actually a master.
If the ROLE
commands is not available (it was introduced in Redis 2.8.12), a client may resort to the INFO replication
command parsing the role:
field of the output.
If the instance is not a master as expected, the client should wait a short amount of time (a few hundreds of milliseconds) and should try again starting from Step 1.
Handling reconnections
Once the service name is resolved into the master address and a connection is established with the Redis master instance, every time a reconnection is needed, the client should resolve again the address using Sentinels restarting from Step 1. For instance Sentinel should contacted again the following cases:
- If the client reconnects after a timeout or socket error.
- If the client reconnects because it was explicitly closed or reconnected by the user.
In the above cases and any other case where the client lost the connection with the Redis server, the client should resolve the master address again.
Sentinel failover disconnection
Starting with Redis 2.8.12, when Redis Sentinel changes the configuration of
an instance, for example promoting a replica to a master, demoting a master to
replicate to the new master after a failover, or simply changing the master
address of a stale replica instance, it sends a CLIENT KILL type normal
command to the instance in order to make sure all the clients are disconnected
from the reconfigured instance. This will force clients to resolve the master
address again.
If the client will contact a Sentinel with yet not updated information, the verification of the Redis instance role via the ROLE
command will fail, allowing the client to detect that the contacted Sentinel provided stale information, and will try again.
Note: it is possible that a stale master returns online at the same time a client contacts a stale Sentinel instance, so the client may connect with a stale master, and yet the ROLE output will match. However when the master is back again Sentinel will try to demote it to replica, triggering a new disconnection. The same reasoning applies to connecting to stale replicas that will get reconfigured to replicate with a different master.
Connecting to replicas
Sometimes clients are interested to connect to replicas, for example in order to scale read requests. This protocol supports connecting to replicas by modifying step 2 slightly. Instead of calling the following command:
SENTINEL get-master-addr-by-name master-name
The clients should call instead:
SENTINEL replicas master-name
In order to retrieve a list of replica instances.
Symmetrically the client should verify with the ROLE
command that the
instance is actually a replica, in order to avoid scaling read queries with
the master.
Connection pools
For clients implementing connection pools, on reconnection of a single connection, the Sentinel should be contacted again, and in case of a master address change all the existing connections should be closed and connected to the new address.
Error reporting
The client should correctly return the information to the user in case of errors. Specifically:
- If no Sentinel can be contacted (so that the client was never able to get the reply to
SENTINEL get-master-addr-by-name
), an error that clearly states that Redis Sentinel is unreachable should be returned. - If all the Sentinels in the pool replied with a null reply, the user should be informed with an error that Sentinels don’t know this master name.
Sentinels list automatic refresh
Optionally once a successful reply to get-master-addr-by-name
is received, a client may update its internal list of Sentinel nodes following this procedure:
- Obtain a list of other Sentinels for this master using the command
SENTINEL sentinels <master-name>
. - Add every ip:port pair not already existing in our list at the end of the list.
It is not needed for a client to be able to make the list persistent updating its own configuration. The ability to upgrade the in-memory representation of the list of Sentinels can be already useful to improve reliability.
Subscribe to Sentinel events to improve responsiveness
The Sentinel documentation shows how clients can connect to Sentinel instances using Pub/Sub in order to subscribe to changes in the Redis instances configurations.
This mechanism can be used in order to speedup the reconfiguration of clients, that is, clients may listen to Pub/Sub in order to know when a configuration change happened in order to run the three steps protocol explained in this document in order to resolve the new Redis master (or replica) address.
However update messages received via Pub/Sub should not substitute the above procedure, since there is no guarantee that a client is able to receive all the update messages.
Additional information
For additional information or to discuss specific aspects of this guidelines, please drop a message to the Redis Google Group.
11 - Redis command arguments
The COMMAND DOCS
command returns documentation-focused information about available Redis commands.
The map reply that the command returns includes the arguments key.
This key stores an array that describes the command’s arguments.
Every element in the arguments array is a map with the following fields:
- name: the argument’s name, always present. The name of an argument is given for identification purposes alone. It isn’t displayed during the command’s syntax rendering.
- type: the argument’s type, always present.
An argument must have one of the following types:
- string: a string argument.
- integer: an integer argument.
- double: a double-precision argument.
- key: a string that represents the name of a key.
- pattern: a string that represents a glob-like pattern.
- unix-time: an integer that represents a Unix timestamp.
- pure-token: an argument is a token, meaning a reserved keyword, which may or may not be provided. Not to be confused with free-text user input.
- oneof: the argument is a container for nested arguments.
This type enables choice among several nested arguments (see the
XADD
example below). - block: the argument is a container for nested arguments.
This type enables grouping arguments and applying a property (such as optional) to all (see the
XADD
example below).
- key-spec-index: this value is available for every argument of the key type. It is a 0-based index of the specification in the command’s key specifications that corresponds to the argument.
- token: a constant literal that precedes the argument (user input) itself.
- summary: a short description of the argument.
- since: the debut Redis version of the argument.
- flags: an array of argument flags.
Possible flags are:
- optional: denotes that the argument is optional (for example, the GET clause of the
SET
command). - multiple: denotes that the argument may be repeated (such as the key argument of
DEL
). - multiple-token: denotes the possible repetition of the argument with its preceding token (see
SORT
’sGET pattern
clause).
- optional: denotes that the argument is optional (for example, the GET clause of the
- value: the argument’s value. For arguments types other than oneof and block, this is a string that describes the value in the command’s syntax. For the oneof and block types, this is an array of nested arguments, each being a map as described in this section.
Example
The trimming clause of XADD
, i.e., [MAXLEN|MINID [=|~] threshold [LIMIT count]]
, is represented at the top-level as block-typed argument.
It consists of four nested arguments:
- trimming strategy: this nested argument has a oneof type with two nested arguments. Each of the nested arguments, MAXLEN and MINID, is typed as pure-token.
- trimming operator: this nested argument is an optional oneof type with two nested arguments. Each of the nested arguments, = and ~, is a pure-token.
- threshold: this nested argument is a string.
- count: this nested argument is an optional integer with a token (LIMIT).
Here’s XADD
’s arguments array:
1) 1) "name"
2) "key"
3) "type"
4) "key"
5) "value"
6) "key"
2) 1) "name"
2) "nomkstream"
3) "type"
4) "pure-token"
5) "token"
6) "NOMKSTREAM"
7) "since"
8) "6.2"
9) "flags"
10) 1) optional
3) 1) "name"
2) "trim"
3) "type"
4) "block"
5) "flags"
6) 1) optional
7) "value"
8) 1) 1) "name"
2) "strategy"
3) "type"
4) "oneof"
5) "value"
6) 1) 1) "name"
2) "maxlen"
3) "type"
4) "pure-token"
5) "token"
6) "MAXLEN"
2) 1) "name"
2) "minid"
3) "type"
4) "pure-token"
5) "token"
6) "MINID"
7) "since"
8) "6.2"
2) 1) "name"
2) "operator"
3) "type"
4) "oneof"
5) "flags"
6) 1) optional
7) "value"
8) 1) 1) "name"
2) "equal"
3) "type"
4) "pure-token"
5) "token"
6) "="
2) 1) "name"
2) "approximately"
3) "type"
4) "pure-token"
5) "token"
6) "~"
3) 1) "name"
2) "threshold"
3) "type"
4) "string"
5) "value"
6) "threshold"
4) 1) "name"
2) "count"
3) "type"
4) "integer"
5) "token"
6) "LIMIT"
7) "since"
8) "6.2"
9) "flags"
10) 1) optional
11) "value"
12) "count"
4) 1) "name"
2) "id_or_auto"
3) "type"
4) "oneof"
5) "value"
6) 1) 1) "name"
2) "auto_id"
3) "type"
4) "pure-token"
5) "token"
6) "*"
2) 1) "name"
2) "id"
3) "type"
4) "string"
5) "value"
6) "id"
5) 1) "name"
2) "field_value"
3) "type"
4) "block"
5) "flags"
6) 1) multiple
7) "value"
8) 1) 1) "name"
2) "field"
3) "type"
4) "string"
5) "value"
6) "field"
2) 1) "name"
2) "value"
3) "type"
4) "string"
5) "value"
6) "value"
12 - Redis command tips
Command tips are an array of strings. These provide Redis clients with additional information about the command. The information can instruct Redis Cluster clients as to how the command should be executed and its output processed in a clustered deployment.
Unlike the command’s flags (see the 3rd element of COMMAND
’s reply), which are strictly internal to the server’s operation, tips don’t serve any purpose other than being reported to clients.
Command tips are arbitrary strings. However, the following sections describe proposed tips and demonstrate the conventions they are likely to adhere to.
nondeterministic-output
This tip indicates that the command’s output isn’t deterministic.
That means that calls to the command may yield different results with the same arguments and data.
That difference could be the result of the command’s random nature (e.g., RANDOMKEY
and SPOP
); the call’s timing (e.g. TTL
); or generic differences that relate to the server’s state (e.g. INFO
and CLIENT LIST
).
Note: prior to Redis 7.0, this tip was the random command flag.
nondeterministic-output-order
The existence of this tip indicates that the command’s output is deterministic, but its ordering is random (e.g. HGETALL
and SMEMBERS
).
Note: prior to Redis 7.0, this tip was the sort_for_script flag.
request_policy
This tip can help clients determine the shard(s) to send the command in clustering mode. The default behavior a client should implement for commands without the request_policy tip is as follows:
- The command doesn’t accept key name arguments: the client can execute the command on an arbitrary shard.
- For commands that accept one or more key name arguments: the client should route the command to a single shard, as determined by the hash slot of the input keys.
In cases where the client should adopt a behavior different than the default, the request_policy tip can be one of:
- all_nodes: the client should execute the command on all nodes - masters and replicas alike.
An example is the
CONFIG SET
command. This tip is in-use by commands that don’t accept key name arguments. The command operates atomically per shard.
- all_shards: the client should execute the command on all master shards (e.g., the
DBSIZE
command). This tip is in-use by commands that don’t accept key name arguments. The command operates atomically per shard.
- multi_shard: the client should execute the command on several shards.
The shards that execute the command are determined by the hash slots of its input key name arguments.
Examples for such commands include
MSET
,MGET
andDEL
. However, note thatSUNIONSTORE
isn’t considered as multi_shard because all of its keys must belong to the same hash slot. - special: indicates a non-trivial form of the client’s request policy, such as the
SCAN
command.
response_policy
This tip can help clients determine the aggregate they need to compute from the replies of multiple shards in a cluster. The default behavior for commands without a request_policy tip only applies to replies with of nested types (i.e., an array, a set, or a map). The client’s implementation for the default behavior should be as follows:
- The command doesn’t accept key name arguments: the client can aggregate all replies within a single nested data structure.
For example, the array replies we get from calling
KEYS
against all shards. These should be packed in a single in no particular order. - For commands that accept one or more key name arguments: the client needs to retain the same order of replies as the input key names.
For example,
MGET
’s aggregated reply.
The response_policy tip is set for commands that reply with scalar data types, or when it’s expected that clients implement a non-default aggregate. This tip can be one of:
- one_succeeded: the clients should return success if at least one shard didn’t reply with an error.
The client should reply with the first non-error reply it obtains.
If all shards return an error, the client can reply with any one of these.
For example, consider a
SCRIPT KILL
command that’s sent to all shards. Although the script should be loaded in all of the cluster’s shards, theSCRIPT KILL
will typically run only on one at a given time. - all_succeeded: the client should return successfully only if there are no error replies.
Even a single error reply should disqualify the aggregate and be returned.
Otherwise, the client should return one of the non-error replies.
As an example, consider the
CONFIG SET
,SCRIPT FLUSH
andSCRIPT LOAD
commands. - agg_logical_and: the client should return the result of a logical AND operation on all replies (only applies to integer replies, usually from commands that return either 0 or 1).
Consider the
SCRIPT EXISTS
command as an example. It returns an array of 0’s and 1’s that denote the existence of its given SHA1 sums in the script cache. The aggregated response should be 1 only when all shards had reported that a given script SHA1 sum is in their respective cache. - agg_logical_or: the client should return the result of a logical AND operation on all replies (only applies to integer replies, usually from commands that return either 0 or 1).
- agg_min: the client should return the minimal value from the replies (only applies to numerical replies).
The aggregate reply from a cluster-wide
WAIT
command, for example, should be the minimal value (number of synchronized replicas) from all shards. - agg_max: the client should return the maximal value from the replies (only applies to numerical replies).
- agg_sum: the client should return the sum of replies (only applies to numerical replies).
Example:
DBSIZE
. - special: this type of tip indicates a non-trivial form of reply policy.
INFO
is an excellent example of that.
Example
redis> command info ping
1) 1) "ping"
2) (integer) -1
3) 1) fast
4) (integer) 0
5) (integer) 0
6) (integer) 0
7) 1) @fast
2) @connection
8) 1) "request_policy:all_shards"
2) "response_policy:all_succeeded"
9) (empty array)
10) (empty array)
13 - Optimizing Redis
13.1 - Redis benchmark
Redis includes the redis-benchmark
utility that simulates running commands done
by N clients at the same time sending M total queries. The utility provides
a default set of tests, or a custom set of tests can be supplied.
The following options are supported:
Usage: redis-benchmark [-h <host>] [-p <port>] [-c <clients>] [-n <requests]> [-k <boolean>]
-h <hostname> Server hostname (default 127.0.0.1)
-p <port> Server port (default 6379)
-s <socket> Server socket (overrides host and port)
-a <password> Password for Redis Auth
-c <clients> Number of parallel connections (default 50)
-n <requests> Total number of requests (default 100000)
-d <size> Data size of SET/GET value in bytes (default 2)
--dbnum <db> SELECT the specified db number (default 0)
-k <boolean> 1=keep alive 0=reconnect (default 1)
-r <keyspacelen> Use random keys for SET/GET/INCR, random values for SADD
Using this option the benchmark will expand the string __rand_int__
inside an argument with a 12 digits number in the specified range
from 0 to keyspacelen-1. The substitution changes every time a command
is executed. Default tests use this to hit random keys in the
specified range.
-P <numreq> Pipeline <numreq> requests. Default 1 (no pipeline).
-q Quiet. Just show query/sec values
--csv Output in CSV format
-l Loop. Run the tests forever
-t <tests> Only run the comma separated list of tests. The test
names are the same as the ones produced as output.
-I Idle mode. Just open N idle connections and wait.
You need to have a running Redis instance before launching the benchmark. You can run the benchmarking utility like so:
redis-benchmark -q -n 100000
Running only a subset of the tests
You don’t need to run all the default tests every time you execute redis-benchmark
.
For example, to select only a subset of tests, use the -t
option
as in the following example:
$ redis-benchmark -t set,lpush -n 100000 -q
SET: 74239.05 requests per second
LPUSH: 79239.30 requests per second
This example runs the tests for the SET
and LPUSH
commands and uses quiet mode (see the -q
switch).
You can even benchmark a specfic command:
$ redis-benchmark -n 100000 -q script load "redis.call('set','foo','bar')"
script load redis.call('set','foo','bar'): 69881.20 requests per second
Selecting the size of the key space
By default, the benchmark runs against a single key. In Redis the difference between such a synthetic benchmark and a real one is not huge since it is an in-memory system, however it is possible to stress cache misses and in general to simulate a more real-world work load by using a large key space.
This is obtained by using the -r
switch. For instance if I want to run
one million SET operations, using a random key for every operation out of
100k possible keys, I’ll use the following command line:
$ redis-cli flushall
OK
$ redis-benchmark -t set -r 100000 -n 1000000
====== SET ======
1000000 requests completed in 13.86 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.76% `<=` 1 milliseconds
99.98% `<=` 2 milliseconds
100.00% `<=` 3 milliseconds
100.00% `<=` 3 milliseconds
72144.87 requests per second
$ redis-cli dbsize
(integer) 99993
Using pipelining
By default every client (the benchmark simulates 50 clients if not otherwise
specified with -c
) sends the next command only when the reply of the previous
command is received, this means that the server will likely need a read call
in order to read each command from every client. Also RTT is paid as well.
Redis supports pipelining, so it is possible to send multiple commands at once, a feature often exploited by real world applications. Redis pipelining is able to dramatically improve the number of operations per second a server is able do deliver.
This is an example of running the benchmark in a MacBook Air 11" using a pipelining of 16 commands:
$ redis-benchmark -n 1000000 -t set,get -P 16 -q
SET: 403063.28 requests per second
GET: 508388.41 requests per second
Using pipelining results in a significant increase in performance.
Pitfalls and misconceptions
The first point is obvious: the golden rule of a useful benchmark is to only compare apples and apples. Different versions of Redis can be compared on the same workload for instance. Or the same version of Redis, but with different options. If you plan to compare Redis to something else, then it is important to evaluate the functional and technical differences, and take them in account.
- Redis is a server: all commands involve network or IPC round trips. It is meaningless to compare it to embedded data stores, because the cost of most operations is primarily in network/protocol management.
- Redis commands return an acknowledgment for all usual commands. Some other data stores do not. Comparing Redis to stores involving one-way queries is only mildly useful.
- Naively iterating on synchronous Redis commands does not benchmark Redis itself, but rather measure your network (or IPC) latency and the client library intrinsic latency. To really test Redis, you need multiple connections (like redis-benchmark) and/or to use pipelining to aggregate several commands and/or multiple threads or processes.
- Redis is an in-memory data store with some optional persistence options. If you plan to compare it to transactional servers (MySQL, PostgreSQL, etc …), then you should consider activating AOF and decide on a suitable fsync policy.
- Redis is, mostly, a single-threaded server from the POV of commands execution (actually modern versions of Redis use threads for different things). It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one single Redis instance to a multi-threaded data store.
The redis-benchmark
program is a quick and useful way to get some figures and
evaluate the performance of a Redis instance on a given hardware. However,
by default, it does not represent the maximum throughput a Redis instance can
sustain. Actually, by using pipelining and a fast client (hiredis), it is fairly
easy to write a program generating more throughput than redis-benchmark. The
default behavior of redis-benchmark is to achieve throughput by exploiting
concurrency only (i.e. it creates several connections to the server).
It does not use pipelining or any parallelism at all (one pending query per
connection at most, and no multi-threading), if not explicitly enabled via
the -P
parameter. So in some way using redis-benchmark
and, triggering, for
example, a BGSAVE
operation in the background at the same time, will provide
the user with numbers more near to the worst case than to the best case.
To run a benchmark using pipelining mode (and achieve higher throughput), you need to explicitly use the -P option. Please note that it is still a realistic behavior since a lot of Redis based applications actively use pipelining to improve performance. However you should use a pipeline size that is more or less the average pipeline length you’ll be able to use in your application in order to get realistic numbers.
The benchmark should apply the same operations, and work in the same way with the multiple data stores you want to compare. It is absolutely pointless to compare the result of redis-benchmark to the result of another benchmark program and extrapolate.
For instance, Redis and memcached in single-threaded mode can be compared on GET/SET operations. Both are in-memory data stores, working mostly in the same way at the protocol level. Provided their respective benchmark application is aggregating queries in the same way (pipelining) and use a similar number of connections, the comparison is actually meaningful.
When you’re benchmarking a high-performance, in-memory database like Redis, it may be difficult to saturate the server. Sometimes, the performance bottleneck is on the client side, and not the server-side. In that case, the client (i.e., the benchmarking program itself) must be fixed, or perhaps scaled out, to reach the maximum throughput.
Factors impacting Redis performance
There are multiple factors having direct consequences on Redis performance. We mention them here, since they can alter the result of any benchmarks. Please note however, that a typical Redis instance running on a low end, untuned box usually provides good enough performance for most applications.
- Network bandwidth and latency usually have a direct impact on the performance. It is a good practice to use the ping program to quickly check the latency between the client and server hosts is normal before launching the benchmark. Regarding the bandwidth, it is generally useful to estimate the throughput in Gbit/s and compare it to the theoretical bandwidth of the network. For instance a benchmark setting 4 KB strings in Redis at 100000 q/s, would actually consume 3.2 Gbit/s of bandwidth and probably fit within a 10 Gbit/s link, but not a 1 Gbit/s one. In many real world scenarios, Redis throughput is limited by the network well before being limited by the CPU. To consolidate several high-throughput Redis instances on a single server, it worth considering putting a 10 Gbit/s NIC or multiple 1 Gbit/s NICs with TCP/IP bonding.
- CPU is another very important factor. Being single-threaded, Redis favors fast CPUs with large caches and not many cores. At this game, Intel CPUs are currently the winners. It is not uncommon to get only half the performance on an AMD Opteron CPU compared to similar Nehalem EP/Westmere EP/Sandy Bridge Intel CPUs with Redis. When client and server run on the same box, the CPU is the limiting factor with redis-benchmark.
- Speed of RAM and memory bandwidth seem less critical for global performance especially for small objects. For large objects (>10 KB), it may become noticeable though. Usually, it is not really cost-effective to buy expensive fast memory modules to optimize Redis.
- Redis runs slower on a VM compared to running without virtualization using
the same hardware. If you have the chance to run Redis on a physical machine
this is preferred. However this does not mean that Redis is slow in
virtualized environments, the delivered performances are still very good
and most of the serious performance issues you may incur in virtualized
environments are due to over-provisioning, non-local disks with high latency,
or old hypervisor software that have slow
fork
syscall implementation. - When the server and client benchmark programs run on the same box, both the TCP/IP loopback and unix domain sockets can be used. Depending on the platform, unix domain sockets can achieve around 50% more throughput than the TCP/IP loopback (on Linux for instance). The default behavior of redis-benchmark is to use the TCP/IP loopback.
- The performance benefit of unix domain sockets compared to TCP/IP loopback tends to decrease when pipelining is heavily used (i.e. long pipelines).
- When an ethernet network is used to access Redis, aggregating commands using pipelining is especially efficient when the size of the data is kept under the ethernet packet size (about 1500 bytes). Actually, processing 10 bytes, 100 bytes, or 1000 bytes queries almost result in the same throughput. See the graph below.
- On multi CPU sockets servers, Redis performance becomes dependent on the NUMA configuration and process location. The most visible effect is that redis-benchmark results seem non-deterministic because client and server processes are distributed randomly on the cores. To get deterministic results, it is required to use process placement tools (on Linux: taskset or numactl). The most efficient combination is always to put the client and server on two different cores of the same CPU to benefit from the L3 cache. Here are some results of 4 KB SET benchmark for 3 server CPUs (AMD Istanbul, Intel Nehalem EX, and Intel Westmere) with different relative placements. Please note this benchmark is not meant to compare CPU models between themselves (CPUs exact model and frequency are therefore not disclosed).
- With high-end configurations, the number of client connections is also an important factor. Being based on epoll/kqueue, the Redis event loop is quite scalable. Redis has already been benchmarked at more than 60000 connections, and was still able to sustain 50000 q/s in these conditions. As a rule of thumb, an instance with 30000 connections can only process half the throughput achievable with 100 connections. Here is an example showing the throughput of a Redis instance per number of connections:
- With high-end configurations, it is possible to achieve higher throughput by tuning the NIC(s) configuration and associated interruptions. Best throughput is achieved by setting an affinity between Rx/Tx NIC queues and CPU cores, and activating RPS (Receive Packet Steering) support. More information in this thread. Jumbo frames may also provide a performance boost when large objects are used.
- Depending on the platform, Redis can be compiled against different memory
allocators (libc malloc, jemalloc, tcmalloc), which may have different behaviors
in term of raw speed, internal and external fragmentation.
If you did not compile Redis yourself, you can use the INFO command to check
the
mem_allocator
field. Please note most benchmarks do not run long enough to generate significant external fragmentation (contrary to production Redis instances).
Other things to consider
One important goal of any benchmark is to get reproducible results, so they can be compared to the results of other tests.
- A good practice is to try to run tests on isolated hardware as much as possible. If it is not possible, then the system must be monitored to check the benchmark is not impacted by some external activity.
- Some configurations (desktops and laptops for sure, some servers as well) have a variable CPU core frequency mechanism. The policy controlling this mechanism can be set at the OS level. Some CPU models are more aggressive than others at adapting the frequency of the CPU cores to the workload. To get reproducible results, it is better to set the highest possible fixed frequency for all the CPU cores involved in the benchmark.
- An important point is to size the system accordingly to the benchmark.
The system must have enough RAM and must not swap. On Linux, do not forget
to set the
overcommit_memory
parameter correctly. Please note 32 and 64 bit Redis instances do not have the same memory footprint. - If you plan to use RDB or AOF for your benchmark, please check there is no other I/O activity in the system. Avoid putting RDB or AOF files on NAS or NFS shares, or on any other devices impacting your network bandwidth and/or latency (for instance, EBS on Amazon EC2).
- Set Redis logging level (loglevel parameter) to warning or notice. Avoid putting the generated log file on a remote filesystem.
- Avoid using monitoring tools which can alter the result of the benchmark. For instance using INFO at regular interval to gather statistics is probably fine, but MONITOR will impact the measured performance significantly.
Other Redis benchmarking tools
There are several third-party tools that can be used for benchmarking Redis. Refer to each tool’s documentation for more information about its goals and capabilities.
- memtier_benchmark from Redis Ltd. is a NoSQL Redis and Memcache traffic generation and benchmarking tool.
- rpc-perf from Twitter is a tool for benchmarking RPC services that supports Redis and Memcache.
- YCSB from Yahoo @Yahoo is a benchmarking framework with clients to many databases, including Redis.
13.2 - Redis CPU profiling
Filling the performance checklist
Redis is developed with a great emphasis on performance. We do our best with every release to make sure you’ll experience a very stable and fast product.
Nevertheless, if you’re finding room to improve the efficiency of Redis or are pursuing a performance regression investigation you will need a concise methodical way of monitoring and analyzing Redis performance.
To do so you can rely on different methodologies (some more suited than other depending on the class of issues/analysis we intent to make). A curated list of methodologies and their steps are enumerated by Brendan Greg at the following link.
We recommend the Utilization Saturation and Errors (USE) Method for answering the question of what is your bottleneck. Check the following mapping between system resource, metric, and tools for a pratical deep dive: USE method.
Ensuring the CPU is your bottleneck
This guide assumes you’ve followed one of the above methodologies to perform a complete check of system health, and identified the bottleneck being the CPU. If you have identified that most of the time is spent blocked on I/O, locks, timers, paging/swapping, etc., this guide is not for you.
Build Prerequisites
For a proper On-CPU analysis, Redis (and any dynamically loaded library like Redis Modules) requires stack traces to be available to tracers, which you may need to fix first.
By default, Redis is compiled with the -O2
switch (which we intent to keep
during profiling). This means that compiler optimizations are enabled. Many
compilers omit the frame pointer as a runtime optimization (saving a register),
thus breaking frame pointer-based stack walking. This makes the Redis
executable faster, but at the same time it makes Redis (like any other program)
harder to trace, potentially wrongfully pinpointing on-CPU time to the last
available frame pointer of a call stack that can get a lot deeper (but
impossible to trace).
It’s important that you ensure that:
- debug information is present: compile option
-g
- frame pointer register is present:
-fno-omit-frame-pointer
- we still run with optimizations to get an accurate representation of production run times, meaning we will keep:
-O2
You can do it as follows within redis main repo:
$ make REDIS_CFLAGS="-g -fno-omit-frame-pointer"
A set of instruments to identify performance regressions and/or potential on-CPU performance improvements
This document focuses specifically on on-CPU resource bottlenecks analysis, meaning we’re interested in understanding where threads are spending CPU cycles while running on-CPU and, as importantly, whether those cycles are effectively being used for computation or stalled waiting (not blocked!) for memory I/O, and cache misses, etc.
For that we will rely on toolkits (perf, bcc tools), and hardware specific PMCs (Performance Monitoring Counters), to proceed with:
-
Hotspot analysis (pref or bcc tools): to profile code execution and determine which functions are consuming the most time and thus are targets for optimization. We’ll present two options to collect, report, and visualize hotspots either with perf or bcc/BPF tracing tools.
-
Call counts analysis: to count events including function calls, enabling us to correlate several calls/components at once, relying on bcc/BPF tracing tools.
-
Hardware event sampling: crucial for understanding CPU behavior, including memory I/O, stall cycles, and cache misses.
Tool prerequesits
The following steps rely on Linux perf_events (aka “perf”), bcc/BPF tracing tools, and Brendan Greg’s FlameGraph repo.
We assume beforehand you have:
- Installed the perf tool on your system. Most Linux distributions will likely package this as a package related to the kernel. More information about the perf tool can be found at perf wiki.
- Followed the install bcc/BPF instructions to install bcc toolkit on your machine.
- Cloned Brendan Greg’s FlameGraph repo and made accessible the
difffolded.pl
andflamegraph.pl
files, to generated the collapsed stack traces and Flame Graphs.
Hotspot analysis with perf or eBPF (stack traces sampling)
Profiling CPU usage by sampling stack traces at a timed interval is a fast and easy way to identify performance-critical code sections (hotspots).
Sampling stack traces using perf
To profile both user- and kernel-level stacks of redis-server for a specific length of time, for example 60 seconds, at a sampling frequency of 999 samples per second:
$ perf record -g --pid $(pgrep redis-server) -F 999 -- sleep 60
Displaying the recorded profile information using perf report
By default perf record will generate a perf.data file in the current working directory.
You can then report with a call-graph output (call chain, stack backtrace), with a minimum call graph inclusion threshold of 0.5%, with:
$ perf report -g "graph,0.5,caller"
See the perf report documention for advanced filtering, sorting and aggregation capabilities.
Visualizing the recorded profile information using Flame Graphs
Flame graphs allow for a quick and accurate visualization of frequent code-paths. They can be generated using Brendan Greg’s open source programs on github, which create interactive SVGs from folded stack files.
Specifically, for perf we need to convert the generated perf.data into the captured stacks, and fold each of them into single lines. You can then render the on-CPU flame graph with:
$ perf script > redis.perf.stacks
$ stackcollapse-perf.pl redis.perf.stacks > redis.folded.stacks
$ flamegraph.pl redis.folded.stacks > redis.svg
By default, perf script will generate a perf.data file in the current working directory. See the perf script documentation for advanced usage.
See FlameGraph usage options for more advanced stack trace visualizations (like the differential one).
Archiving and sharing recorded profile information
So that analysis of the perf.data contents can be possible on a machine other than the one on which collection happened, you need to export along with the perf.data file all object files with build-ids found in the record data file. This can be easily done with the help of perf-archive.sh script:
$ perf-archive.sh perf.data
Now please run:
$ tar xvf perf.data.tar.bz2 -C ~/.debug
on the machine where you need to run perf report
.
Sampling stack traces using bcc/BPF’s profile
Similarly to perf, as of Linux kernel 4.9, BPF-optimized profiling is now fully available with the promise of lower overhead on CPU (as stack traces are frequency counted in kernel context) and disk I/O resources during profiling.
Apart from that, and relying solely on bcc/BPF’s profile tool, we have also removed the perf.data and intermediate steps if stack traces analysis is our main goal. You can use bcc’s profile tool to output folded format directly, for flame graph generation:
$ /usr/share/bcc/tools/profile -F 999 -f --pid $(pgrep redis-server) --duration 60 > redis.folded.stacks
In that manner, we’ve remove any preprocessing and can render the on-CPU flame graph with a single command:
$ flamegraph.pl redis.folded.stacks > redis.svg
Visualizing the recorded profile information using Flame Graphs
Call counts analysis with bcc/BPF
A function may consume significant CPU cycles either because its code is slow
or because it’s frequently called. To answer at what rate functions are being
called, you can rely upon call counts analysis using BCC’s funccount
tool:
$ /usr/share/bcc/tools/funccount 'redis-server:(call*|*Read*|*Write*)' --pid $(pgrep redis-server) --duration 60
Tracing 64 functions for "redis-server:(call*|*Read*|*Write*)"... Hit Ctrl-C to end.
FUNC COUNT
call 334
handleClientsWithPendingWrites 388
clientInstallWriteHandler 388
postponeClientRead 514
handleClientsWithPendingReadsUsingThreads 735
handleClientsWithPendingWritesUsingThreads 735
prepareClientToWrite 1442
Detaching...
The above output shows that, while tracing, the Redis’s call() function was called 334 times, handleClientsWithPendingWrites() 388 times, etc.
Hardware event counting with Performance Monitoring Counters (PMCs)
Many modern processors contain a performance monitoring unit (PMU) exposing Performance Monitoring Counters (PMCs). PMCs are crucial for understanding CPU behavior, including memory I/O, stall cycles, and cache misses, and provide low-level CPU performance statistics that aren’t available anywhere else.
The design and functionality of a PMU is CPU-specific and you should assess
your CPU supported counters and features by using perf list
.
To calculate the number of instructions per cycle, the number of micro ops executed, the number of cycles during which no micro ops were dispatched, the number stalled cycles on memory, including a per memory type stalls, for the duration of 60s, specifically for redis process:
$ perf stat -e "cpu-clock,cpu-cycles,instructions,uops_executed.core,uops_executed.stall_cycles,cache-references,cache-misses,cycle_activity.stalls_total,cycle_activity.stalls_mem_any,cycle_activity.stalls_l3_miss,cycle_activity.stalls_l2_miss,cycle_activity.stalls_l1d_miss" --pid $(pgrep redis-server) -- sleep 60
Performance counter stats for process id '3038':
60046.411437 cpu-clock (msec) # 1.001 CPUs utilized
168991975443 cpu-cycles # 2.814 GHz (36.40%)
388248178431 instructions # 2.30 insn per cycle (45.50%)
443134227322 uops_executed.core # 7379.862 M/sec (45.51%)
30317116399 uops_executed.stall_cycles # 504.895 M/sec (45.51%)
670821512 cache-references # 11.172 M/sec (45.52%)
23727619 cache-misses # 3.537 % of all cache refs (45.43%)
30278479141 cycle_activity.stalls_total # 504.251 M/sec (36.33%)
19981138777 cycle_activity.stalls_mem_any # 332.762 M/sec (36.33%)
725708324 cycle_activity.stalls_l3_miss # 12.086 M/sec (36.33%)
8487905659 cycle_activity.stalls_l2_miss # 141.356 M/sec (36.32%)
10011909368 cycle_activity.stalls_l1d_miss # 166.736 M/sec (36.31%)
60.002765665 seconds time elapsed
It’s important to know that there are two very different ways in which PMCs can be used (couting and sampling), and we’ve focused solely on PMCs counting for the sake of this analysis. Brendan Greg clearly explains it on the following link.
13.3 - Diagnosing latency issues
This document will help you understand what the problem could be if you are experiencing latency problems with Redis.
In this context latency is the maximum delay between the time a client issues a command and the time the reply to the command is received by the client. Usually Redis processing time is extremely low, in the sub microsecond range, but there are certain conditions leading to higher latency figures.
I’ve little time, give me the checklist
The following documentation is very important in order to run Redis in a low latency fashion. However I understand that we are busy people, so let’s start with a quick checklist. If you fail following these steps, please return here to read the full documentation.
- Make sure you are not running slow commands that are blocking the server. Use the Redis Slow Log feature to check this.
- For EC2 users, make sure you use HVM based modern EC2 instances, like m3.medium. Otherwise fork() is too slow.
- Transparent huge pages must be disabled from your kernel. Use
echo never > /sys/kernel/mm/transparent_hugepage/enabled
to disable them, and restart your Redis process. - If you are using a virtual machine, it is possible that you have an intrinsic latency that has nothing to do with Redis. Check the minimum latency you can expect from your runtime environment using
./redis-cli --intrinsic-latency 100
. Note: you need to run this command in the server not in the client. - Enable and use the Latency monitor feature of Redis in order to get a human readable description of the latency events and causes in your Redis instance.
In general, use the following table for durability VS latency/performance tradeoffs, ordered from stronger safety to better latency.
- AOF + fsync always: this is very slow, you should use it only if you know what you are doing.
- AOF + fsync every second: this is a good compromise.
- AOF + fsync every second + no-appendfsync-on-rewrite option set to yes: this is as the above, but avoids to fsync during rewrites to lower the disk pressure.
- AOF + fsync never. Fsyncing is up to the kernel in this setup, even less disk pressure and risk of latency spikes.
- RDB. Here you have a vast spectrum of tradeoffs depending on the save triggers you configure.
And now for people with 15 minutes to spend, the details…
Measuring latency
If you are experiencing latency problems, you probably know how to measure it in the context of your application, or maybe your latency problem is very evident even macroscopically. However redis-cli can be used to measure the latency of a Redis server in milliseconds, just try:
redis-cli --latency -h `host` -p `port`
Using the internal Redis latency monitoring subsystem
Since Redis 2.8.13, Redis provides latency monitoring capabilities that are able to sample different execution paths to understand where the server is blocking. This makes debugging of the problems illustrated in this documentation much simpler, so we suggest enabling latency monitoring ASAP. Please refer to the Latency monitor documentation.
While the latency monitoring sampling and reporting capabilities will make it simpler to understand the source of latency in your Redis system, it is still advised that you read this documentation extensively to better understand the topic of Redis and latency spikes.
Latency baseline
There is a kind of latency that is inherently part of the environment where you run Redis, that is the latency provided by your operating system kernel and, if you are using virtualization, by the hypervisor you are using.
While this latency can’t be removed it is important to study it because it is the baseline, or in other words, you won’t be able to achieve a Redis latency that is better than the latency that every process running in your environment will experience because of the kernel or hypervisor implementation or setup.
We call this kind of latency intrinsic latency, and redis-cli
starting
from Redis version 2.8.7 is able to measure it. This is an example run
under Linux 3.11.0 running on an entry level server.
Note: the argument 100
is the number of seconds the test will be executed.
The more time we run the test, the more likely we’ll be able to spot
latency spikes. 100 seconds is usually appropriate, however you may want
to perform a few runs at different times. Please note that the test is CPU
intensive and will likely saturate a single core in your system.
$ ./redis-cli --intrinsic-latency 100
Max latency so far: 1 microseconds.
Max latency so far: 16 microseconds.
Max latency so far: 50 microseconds.
Max latency so far: 53 microseconds.
Max latency so far: 83 microseconds.
Max latency so far: 115 microseconds.
Note: redis-cli in this special case needs to run in the server where you run or plan to run Redis, not in the client. In this special mode redis-cli does not connect to a Redis server at all: it will just try to measure the largest time the kernel does not provide CPU time to run to the redis-cli process itself.
In the above example, the intrinsic latency of the system is just 0.115 milliseconds (or 115 microseconds), which is a good news, however keep in mind that the intrinsic latency may change over time depending on the load of the system.
Virtualized environments will not show so good numbers, especially with high load or if there are noisy neighbors. The following is a run on a Linode 4096 instance running Redis and Apache:
$ ./redis-cli --intrinsic-latency 100
Max latency so far: 573 microseconds.
Max latency so far: 695 microseconds.
Max latency so far: 919 microseconds.
Max latency so far: 1606 microseconds.
Max latency so far: 3191 microseconds.
Max latency so far: 9243 microseconds.
Max latency so far: 9671 microseconds.
Here we have an intrinsic latency of 9.7 milliseconds: this means that we can’t ask better than that to Redis. However other runs at different times in different virtualization environments with higher load or with noisy neighbors can easily show even worse values. We were able to measure up to 40 milliseconds in systems otherwise apparently running normally.
Latency induced by network and communication
Clients connect to Redis using a TCP/IP connection or a Unix domain connection. The typical latency of a 1 Gbit/s network is about 200 us, while the latency with a Unix domain socket can be as low as 30 us. It actually depends on your network and system hardware. On top of the communication itself, the system adds some more latency (due to thread scheduling, CPU caches, NUMA placement, etc …). System induced latencies are significantly higher on a virtualized environment than on a physical machine.
The consequence is even if Redis processes most commands in sub microsecond range, a client performing many roundtrips to the server will have to pay for these network and system related latencies.
An efficient client will therefore try to limit the number of roundtrips by pipelining several commands together. This is fully supported by the servers and most clients. Aggregated commands like MSET/MGET can be also used for that purpose. Starting with Redis 2.4, a number of commands also support variadic parameters for all data types.
Here are some guidelines:
- If you can afford it, prefer a physical machine over a VM to host the server.
- Do not systematically connect/disconnect to the server (especially true for web based applications). Keep your connections as long lived as possible.
- If your client is on the same host than the server, use Unix domain sockets.
- Prefer to use aggregated commands (MSET/MGET), or commands with variadic parameters (if possible) over pipelining.
- Prefer to use pipelining (if possible) over sequence of roundtrips.
- Redis supports Lua server-side scripting to cover cases that are not suitable for raw pipelining (for instance when the result of a command is an input for the following commands).
On Linux, some people can achieve better latencies by playing with process
placement (taskset), cgroups, real-time priorities (chrt), NUMA
configuration (numactl), or by using a low-latency kernel. Please note
vanilla Redis is not really suitable to be bound on a single CPU core.
Redis can fork background tasks that can be extremely CPU consuming
like BGSAVE
or BGREWRITEAOF
. These tasks must never run on the same core
as the main event loop.
In most situations, these kind of system level optimizations are not needed. Only do them if you require them, and if you are familiar with them.
Single threaded nature of Redis
Redis uses a mostly single threaded design. This means that a single process serves all the client requests, using a technique called multiplexing. This means that Redis can serve a single request in every given moment, so all the requests are served sequentially. This is very similar to how Node.js works as well. However, both products are not often perceived as being slow. This is caused in part by the small amount of time to complete a single request, but primarily because these products are designed to not block on system calls, such as reading data from or writing data to a socket.
I said that Redis is mostly single threaded since actually from Redis 2.4 we use threads in Redis in order to perform some slow I/O operations in the background, mainly related to disk I/O, but this does not change the fact that Redis serves all the requests using a single thread.
Latency generated by slow commands
A consequence of being single thread is that when a request is slow to serve
all the other clients will wait for this request to be served. When executing
normal commands, like GET
or SET
or LPUSH
this is not a problem
at all since these commands are executed in constant (and very small) time.
However there are commands operating on many elements, like SORT
, LREM
,
SUNION
and others. For instance taking the intersection of two big sets
can take a considerable amount of time.
The algorithmic complexity of all commands is documented. A good practice is to systematically check it when using commands you are not familiar with.
If you have latency concerns you should either not use slow commands against values composed of many elements, or you should run a replica using Redis replication where you run all your slow queries.
It is possible to monitor slow commands using the Redis Slow Log feature.
Additionally, you can use your favorite per-process monitoring program (top, htop, prstat, etc …) to quickly check the CPU consumption of the main Redis process. If it is high while the traffic is not, it is usually a sign that slow commands are used.
IMPORTANT NOTE: a VERY common source of latency generated by the execution
of slow commands is the use of the KEYS
command in production environments.
KEYS
, as documented in the Redis documentation, should only be used for
debugging purposes. Since Redis 2.8 a new commands were introduced in order to
iterate the key space and other large collections incrementally, please check
the SCAN
, SSCAN
, HSCAN
and ZSCAN
commands for more information.
Latency generated by fork
In order to generate the RDB file in background, or to rewrite the Append Only File if AOF persistence is enabled, Redis has to fork background processes. The fork operation (running in the main thread) can induce latency by itself.
Forking is an expensive operation on most Unix-like systems, since it involves copying a good number of objects linked to the process. This is especially true for the page table associated to the virtual memory mechanism.
For instance on a Linux/AMD64 system, the memory is divided in 4 kB pages. To convert virtual addresses to physical addresses, each process stores a page table (actually represented as a tree) containing at least a pointer per page of the address space of the process. So a large 24 GB Redis instance requires a page table of 24 GB / 4 kB * 8 = 48 MB.
When a background save is performed, this instance will have to be forked, which will involve allocating and copying 48 MB of memory. It takes time and CPU, especially on virtual machines where allocation and initialization of a large memory chunk can be expensive.
Fork time in different systems
Modern hardware is pretty fast at copying the page table, but Xen is not.
The problem with Xen is not virtualization-specific, but Xen-specific. For instance using VMware or Virtual Box does not result into slow fork time.
The following is a table that compares fork time for different Redis instance
size. Data is obtained performing a BGSAVE and looking at the latest_fork_usec
filed in the INFO
command output.
However the good news is that new types of EC2 HVM based instances are much better with fork times, almost on par with physical servers, so for example using m3.medium (or better) instances will provide good results.
- Linux beefy VM on VMware 6.0GB RSS forked in 77 milliseconds (12.8 milliseconds per GB).
- Linux running on physical machine (Unknown HW) 6.1GB RSS forked in 80 milliseconds (13.1 milliseconds per GB)
- Linux running on physical machine (Xeon @ 2.27Ghz) 6.9GB RSS forked into 62 milliseconds (9 milliseconds per GB).
- Linux VM on 6sync (KVM) 360 MB RSS forked in 8.2 milliseconds (23.3 milliseconds per GB).
- Linux VM on EC2, old instance types (Xen) 6.1GB RSS forked in 1460 milliseconds (239.3 milliseconds per GB).
- Linux VM on EC2, new instance types (Xen) 1GB RSS forked in 10 milliseconds (10 milliseconds per GB).
- Linux VM on Linode (Xen) 0.9GBRSS forked into 382 milliseconds (424 milliseconds per GB).
As you can see certain VMs running on Xen have a performance hit that is between one order to two orders of magnitude. For EC2 users the suggestion is simple: use modern HVM based instances.
Latency induced by transparent huge pages
Unfortunately when a Linux kernel has transparent huge pages enabled, Redis
incurs to a big latency penalty after the fork
call is used in order to
persist on disk. Huge pages are the cause of the following issue:
- Fork is called, two processes with shared huge pages are created.
- In a busy instance, a few event loops runs will cause commands to target a few thousand of pages, causing the copy on write of almost the whole process memory.
- This will result in big latency and big memory usage.
Make sure to disable transparent huge pages using the following command:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
Latency induced by swapping (operating system paging)
Linux (and many other modern operating systems) is able to relocate memory pages from the memory to the disk, and vice versa, in order to use the system memory efficiently.
If a Redis page is moved by the kernel from the memory to the swap file, when the data stored in this memory page is used by Redis (for example accessing a key stored into this memory page) the kernel will stop the Redis process in order to move the page back into the main memory. This is a slow operation involving random I/Os (compared to accessing a page that is already in memory) and will result into anomalous latency experienced by Redis clients.
The kernel relocates Redis memory pages on disk mainly because of three reasons:
- The system is under memory pressure since the running processes are demanding more physical memory than the amount that is available. The simplest instance of this problem is simply Redis using more memory than is available.
- The Redis instance data set, or part of the data set, is mostly completely idle (never accessed by clients), so the kernel could swap idle memory pages on disk. This problem is very rare since even a moderately slow instance will touch all the memory pages often, forcing the kernel to retain all the pages in memory.
- Some processes are generating massive read or write I/Os on the system. Because files are generally cached, it tends to put pressure on the kernel to increase the filesystem cache, and therefore generate swapping activity. Please note it includes Redis RDB and/or AOF background threads which can produce large files.
Fortunately Linux offers good tools to investigate the problem, so the simplest thing to do is when latency due to swapping is suspected is just to check if this is the case.
The first thing to do is to checking the amount of Redis memory that is swapped on disk. In order to do so you need to obtain the Redis instance pid:
$ redis-cli info | grep process_id
process_id:5454
Now enter the /proc file system directory for this process:
$ cd /proc/5454
Here you’ll find a file called smaps that describes the memory layout of the Redis process (assuming you are using Linux 2.6.16 or newer). This file contains very detailed information about our process memory maps, and one field called Swap is exactly what we are looking for. However there is not just a single swap field since the smaps file contains the different memory maps of our Redis process (The memory layout of a process is more complex than a simple linear array of pages).
Since we are interested in all the memory swapped by our process the first thing to do is to grep for the Swap field across all the file:
$ cat smaps | grep 'Swap:'
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 12 kB
Swap: 156 kB
Swap: 8 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Swap: 4 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
If everything is 0 kB, or if there are sporadic 4k entries, everything is perfectly normal. Actually in our example instance (the one of a real web site running Redis and serving hundreds of users every second) there are a few entries that show more swapped pages. To investigate if this is a serious problem or not we change our command in order to also print the size of the memory map:
$ cat smaps | egrep '^(Swap|Size)'
Size: 316 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 8 kB
Swap: 0 kB
Size: 40 kB
Swap: 0 kB
Size: 132 kB
Swap: 0 kB
Size: 720896 kB
Swap: 12 kB
Size: 4096 kB
Swap: 156 kB
Size: 4096 kB
Swap: 8 kB
Size: 4096 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 1272 kB
Swap: 0 kB
Size: 8 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 16 kB
Swap: 0 kB
Size: 84 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 8 kB
Swap: 4 kB
Size: 8 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 4 kB
Size: 144 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 4 kB
Size: 12 kB
Swap: 4 kB
Size: 108 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 272 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
As you can see from the output, there is a map of 720896 kB (with just 12 kB swapped) and 156 kB more swapped in another map: basically a very small amount of our memory is swapped so this is not going to create any problem at all.
If instead a non trivial amount of the process memory is swapped on disk your latency problems are likely related to swapping. If this is the case with your Redis instance you can further verify it using the vmstat command:
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 3980 697932 147180 1406456 0 0 2 2 2 0 4 4 91 0
0 0 3980 697428 147180 1406580 0 0 0 0 19088 16104 9 6 84 0
0 0 3980 697296 147180 1406616 0 0 0 28 18936 16193 7 6 87 0
0 0 3980 697048 147180 1406640 0 0 0 0 18613 15987 6 6 88 0
2 0 3980 696924 147180 1406656 0 0 0 0 18744 16299 6 5 88 0
0 0 3980 697048 147180 1406688 0 0 0 4 18520 15974 6 6 88 0
^C
The interesting part of the output for our needs are the two columns si and so, that counts the amount of memory swapped from/to the swap file. If you see non zero counts in those two columns then there is swapping activity in your system.
Finally, the iostat command can be used to check the global I/O activity of the system.
$ iostat -xk 1
avg-cpu: %user %nice %system %iowait %steal %idle
13.55 0.04 2.92 0.53 0.00 82.95
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.77 0.00 0.01 0.00 0.40 0.00 73.65 0.00 3.62 2.58 0.00
sdb 1.27 4.75 0.82 3.54 38.00 32.32 32.19 0.11 24.80 4.24 1.85
If your latency problem is due to Redis memory being swapped on disk you need to lower the memory pressure in your system, either adding more RAM if Redis is using more memory than the available, or avoiding running other memory hungry processes in the same system.
Latency due to AOF and disk I/O
Another source of latency is due to the Append Only File support on Redis. The AOF basically uses two system calls to accomplish its work. One is write(2) that is used in order to write data to the append only file, and the other one is fdatasync(2) that is used in order to flush the kernel file buffer on disk in order to ensure the durability level specified by the user.
Both the write(2) and fdatasync(2) calls can be source of latency. For instance write(2) can block both when there is a system wide sync in progress, or when the output buffers are full and the kernel requires to flush on disk in order to accept new writes.
The fdatasync(2) call is a worse source of latency as with many combinations of kernels and file systems used it can take from a few milliseconds to a few seconds to complete, especially in the case of some other process doing I/O. For this reason when possible Redis does the fdatasync(2) call in a different thread since Redis 2.4.
We’ll see how configuration can affect the amount and source of latency when using the AOF file.
The AOF can be configured to perform an fsync on disk in three different ways using the appendfsync configuration option (this setting can be modified at runtime using the CONFIG SET command).
-
When appendfsync is set to the value of no Redis performs no fsync. In this configuration the only source of latency can be write(2). When this happens usually there is no solution since simply the disk can’t cope with the speed at which Redis is receiving data, however this is uncommon if the disk is not seriously slowed down by other processes doing I/O.
-
When appendfsync is set to the value of everysec Redis performs an fsync every second. It uses a different thread, and if the fsync is still in progress Redis uses a buffer to delay the write(2) call up to two seconds (since write would block on Linux if an fsync is in progress against the same file). However if the fsync is taking too long Redis will eventually perform the write(2) call even if the fsync is still in progress, and this can be a source of latency.
-
When appendfsync is set to the value of always an fsync is performed at every write operation before replying back to the client with an OK code (actually Redis will try to cluster many commands executed at the same time into a single fsync). In this mode performances are very low in general and it is strongly recommended to use a fast disk and a file system implementation that can perform the fsync in short time.
Most Redis users will use either the no or everysec setting for the appendfsync configuration directive. The suggestion for minimum latency is to avoid other processes doing I/O in the same system. Using an SSD disk can help as well, but usually even non SSD disks perform well with the append only file if the disk is spare as Redis writes to the append only file without performing any seek.
If you want to investigate your latency issues related to the append only file you can use the strace command under Linux:
sudo strace -p $(pidof redis-server) -T -e trace=fdatasync
The above command will show all the fdatasync(2) system calls performed by Redis in the main thread. With the above command you’ll not see the fdatasync system calls performed by the background thread when the appendfsync config option is set to everysec. In order to do so just add the -f switch to strace.
If you wish you can also see both fdatasync and write system calls with the following command:
sudo strace -p $(pidof redis-server) -T -e trace=fdatasync,write
However since write(2) is also used in order to write data to the client sockets this will likely show too many things unrelated to disk I/O. Apparently there is no way to tell strace to just show slow system calls so I use the following command:
sudo strace -f -p $(pidof redis-server) -T -e trace=fdatasync,write 2>&1 | grep -v '0.0' | grep -v unfinished
Latency generated by expires
Redis evict expired keys in two ways:
- One lazy way expires a key when it is requested by a command, but it is found to be already expired.
- One active way expires a few keys every 100 milliseconds.
The active expiring is designed to be adaptive. An expire cycle is started every 100 milliseconds (10 times per second), and will do the following:
- Sample
ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP
keys, evicting all the keys already expired. - If the more than 25% of the keys were found expired, repeat.
Given that ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP
is set to 20 by default, and the process is performed ten times per second, usually just 200 keys per second are actively expired. This is enough to clean the DB fast enough even when already expired keys are not accessed for a long time, so that the lazy algorithm does not help. At the same time expiring just 200 keys per second has no effects in the latency a Redis instance.
However the algorithm is adaptive and will loop if it finds more than 25% of keys already expired in the set of sampled keys. But given that we run the algorithm ten times per second, this means that the unlucky event of more than 25% of the keys in our random sample are expiring at least in the same second.
Basically this means that if the database has many many keys expiring in the same second, and these make up at least 25% of the current population of keys with an expire set, Redis can block in order to get the percentage of keys already expired below 25%.
This approach is needed in order to avoid using too much memory for keys that are already expired, and usually is absolutely harmless since it’s strange that a big number of keys are going to expire in the same exact second, but it is not impossible that the user used EXPIREAT
extensively with the same Unix time.
In short: be aware that many keys expiring at the same moment can be a source of latency.
Redis software watchdog
Redis 2.6 introduces the Redis Software Watchdog that is a debugging tool designed to track those latency problems that for one reason or the other escaped an analysis using normal tools.
The software watchdog is an experimental feature. While it is designed to be used in production environments care should be taken to backup the database before proceeding as it could possibly have unexpected interactions with the normal execution of the Redis server.
It is important to use it only as last resort when there is no way to track the issue by other means.
This is how this feature works:
- The user enables the software watchdog using the
CONFIG SET
command. - Redis starts monitoring itself constantly.
- If Redis detects that the server is blocked into some operation that is not returning fast enough, and that may be the source of the latency issue, a low level report about where the server is blocked is dumped on the log file.
- The user contacts the developers writing a message in the Redis Google Group, including the watchdog report in the message.
Note that this feature cannot be enabled using the redis.conf file, because it is designed to be enabled only in already running instances and only for debugging purposes.
To enable the feature just use the following:
CONFIG SET watchdog-period 500
The period is specified in milliseconds. In the above example I specified to log latency issues only if the server detects a delay of 500 milliseconds or greater. The minimum configurable period is 200 milliseconds.
When you are done with the software watchdog you can turn it off setting the watchdog-period
parameter to 0. Important: remember to do this because keeping the instance with the watchdog turned on for a longer time than needed is generally not a good idea.
The following is an example of what you’ll see printed in the log file once the software watchdog detects a delay longer than the configured one:
[8547 | signal handler] (1333114359)
--- WATCHDOG TIMER EXPIRED ---
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libpthread.so.0(+0xf8f0) [0x7f16b5f158f0]
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libc.so.6(usleep+0x34) [0x7f16b5c62844]
./redis-server(debugCommand+0x3e1) [0x43ab41]
./redis-server(call+0x5d) [0x415a9d]
./redis-server(processCommand+0x375) [0x415fc5]
./redis-server(processInputBuffer+0x4f) [0x4203cf]
./redis-server(readQueryFromClient+0xa0) [0x4204e0]
./redis-server(aeProcessEvents+0x128) [0x411b48]
./redis-server(aeMain+0x2b) [0x411dbb]
./redis-server(main+0x2b6) [0x418556]
/lib/libc.so.6(__libc_start_main+0xfd) [0x7f16b5ba1c4d]
./redis-server() [0x411099]
------
Note: in the example the DEBUG SLEEP command was used in order to block the server. The stack trace is different if the server blocks in a different context.
If you happen to collect multiple watchdog stack traces you are encouraged to send everything to the Redis Google Group: the more traces we obtain, the simpler it will be to understand what the problem with your instance is.
13.4 - Redis latency monitoring
Redis is often used for demanding use cases, where it serves a large number of queries per second per instance, but also has strict latency requirements for the average response time and the worst-case latency.
While Redis is an in-memory system, it deals with the operating system in different ways, for example, in the context of persisting to disk. Moreover Redis implements a rich set of commands. Certain commands are fast and run in constant or logarithmic time. Other commands are slower O(N) commands that can cause latency spikes.
Finally, Redis is single threaded. This is usually an advantage from the point of view of the amount of work it can perform per core, and in the latency figures it is able to provide. However, it poses a challenge for latency, since the single thread must be able to perform certain tasks incrementally, for example key expiration, in a way that does not impact the other clients that are served.
For all these reasons, Redis 2.8.13 introduced a new feature called Latency Monitoring, that helps the user to check and troubleshoot possible latency problems. Latency monitoring is composed of the following conceptual parts:
- Latency hooks that sample different latency-sensitive code paths.
- Time series recording of latency spikes, split by different events.
- Reporting engine to fetch raw data from the time series.
- Analysis engine to provide human-readable reports and hints according to the measurements.
The rest of this document covers the latency monitoring subsystem details. For more information about the general topic of Redis and latency, see Redis latency problems troubleshooting.
Events and time series
Different monitored code paths have different names and are called events.
For example, command
is an event that measures latency spikes of possibly slow
command executions, while fast-command
is the event name for the monitoring
of the O(1) and O(log N) commands. Other events are less generic and monitor
specific operations performed by Redis. For example, the fork
event
only monitors the time taken by Redis to execute the fork(2)
system call.
A latency spike is an event that takes more time to run than the configured latency threshold. There is a separate time series associated with every monitored event. This is how the time series work:
- Every time a latency spike happens, it is logged in the appropriate time series.
- Every time series is composed of 160 elements.
- Each element is a pair made of a Unix timestamp of the time the latency spike was measured and the number of milliseconds the event took to execute.
- Latency spikes for the same event that occur in the same second are merged by taking the maximum latency. Even if continuous latency spikes are measured for a given event, which could happen with a low threshold, at least 180 seconds of history are available.
- Records the all-time maximum latency for every element.
The framework monitors and logs latency spikes in the execution time of these events:
command
: regular commands.fast-command
: O(1) and O(log N) commands.fork
: thefork(2)
system call.rdb-unlink-temp-file
: theunlink(2)
system call.aof-write
: writing to the AOF - a catchall event forfsync(2)
system calls.aof-fsync-always
: thefsync(2)
system call when invoked by theappendfsync allways
policy.aof-write-pending-fsync
: thefsync(2)
system call when there are pending writes.aof-write-active-child
: thefsync(2)
system call when performed by a child process.aof-write-alone
: thefsync(2)
system call when performed by the main process.aof-fstat
: thefstat(2)
system call.aof-rename
: therename(2)
system call for renaming the temporary file after completingBGREWRITEAOF
.aof-rewrite-diff-write
: writing the differences accumulated while performingBGREWRITEAOF
.active-defrag-cycle
: the active defragmentation cycle.expire-cycle
: the expiration cycle.eviction-cycle
: the eviction cycle.eviction-del
: deletes during the eviction cycle.
How to enable latency monitoring
What is high latency for one use case may not be considered high latency for another. Some applications may require that all queries be served in less than 1 millisecond. For other applications, it may be acceptable for a small amount of clients to experience a 2 second latency on occasion.
The first step to enable the latency monitor is to set a latency threshold in milliseconds. Only events that take longer than the specified threshold will be logged as latency spikes. The user should set the threshold according to their needs. For example, if the application requires a maximum acceptable latency of 100 milliseconds, the threshold should be set to log all the events blocking the server for a time equal or greater to 100 milliseconds.
Enable the latency monitor at runtime in a production server with the following command:
CONFIG SET latency-monitor-threshold 100
Monitoring is turned off by default (threshold set to 0), even if the actual cost of latency monitoring is near zero. While the memory requirements of latency monitoring are very small, there is no good reason to raise the baseline memory usage of a Redis instance that is working well.
Report information with the LATENCY command
The user interface to the latency monitoring subsystem is the LATENCY
command.
Like many other Redis commands, LATENCY
accepts subcommands that modify its behavior. These subcommands are:
LATENCY LATEST
- returns the latest latency samples for all events.LATENCY HISTORY
- returns latency time series for a given event.LATENCY RESET
- resets latency time series data for one or more events.LATENCY GRAPH
- renders an ASCII-art graph of an event’s latency samples.LATENCY DOCTOR
- replies with a human-readable latency analysis report.
Refer to each subcommand’s documentation page for further information.
13.5 - Memory optimization
Special encoding of small aggregate data types
Since Redis 2.2 many data types are optimized to use less space up to a certain size. Hashes, Lists, Sets composed of just integers, and Sorted Sets, when smaller than a given number of elements, and up to a maximum element size, are encoded in a very memory efficient way that uses up to 10 times less memory (with 5 time less memory used being the average saving).
This is completely transparent from the point of view of the user and API. Since this is a CPU / memory trade off it is possible to tune the maximum number of elements and maximum element size for special encoded types using the following redis.conf directives.
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
set-max-intset-entries 512
If a specially encoded value overflows the configured max size, Redis will automatically convert it into normal encoding. This operation is very fast for small values, but if you change the setting in order to use specially encoded values for much larger aggregate types the suggestion is to run some benchmarks and tests to check the conversion time.
Using 32 bit instances
Redis compiled with 32 bit target uses a lot less memory per key, since pointers are small, but such an instance will be limited to 4 GB of maximum memory usage. To compile Redis as 32 bit binary use make 32bit. RDB and AOF files are compatible between 32 bit and 64 bit instances (and between little and big endian of course) so you can switch from 32 to 64 bit, or the contrary, without problems.
Bit and byte level operations
Redis 2.2 introduced new bit and byte level operations: GETRANGE
, SETRANGE
, GETBIT
and SETBIT
.
Using these commands you can treat the Redis string type as a random access array.
For instance if you have an application where users are identified by a unique progressive integer number,
you can use a bitmap in order to save information about the subscription of users in a mailing list,
setting the bit for subscribed and clearing it for unsubscribed, or the other way around.
With 100 million users this data will take just 12 megabytes of RAM in a Redis instance.
You can do the same using GETRANGE
and SETRANGE
in order to store one byte of information for each user.
This is just an example but it is actually possible to model a number of problems in very little space with these new primitives.
Use hashes when possible
Small hashes are encoded in a very small space, so you should try representing your data using hashes whenever possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.
If you want to know more about this, read the next section.
Using hashes to abstract a very memory efficient plain key-value store on top of Redis
I understand the title of this section is a bit scary, but I’m going to explain in details what this is about.
Basically it is possible to model a plain key-value store using Redis where values can just be just strings, that is not just more memory efficient than Redis plain keys but also much more memory efficient than memcached.
Let’s start with some facts: a few keys use a lot more memory than a single key containing a hash with a few fields. How is this possible? We use a trick. In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like a hash table.
But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains grows too large (you can configure the limit in redis.conf).
This does not only work well from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than a hash table).
However since hash fields and values are not (always) represented as full featured Redis objects, hash fields can’t have an associated time to live (expire) like a real key, and can only contain a string. But we are okay with this, this was the intention anyway when the hash data type API was designed (we trust simplicity more than features, so nested data structures are not allowed, as expires of single fields are not allowed).
So hashes are memory efficient. This is useful when using hashes to represent objects or to model other problems when there are group of related fields. But what about if we have a plain key value business?
Imagine we want to use Redis as a cache for many small objects, that can be JSON encoded objects, small HTML fragments, simple key -> boolean values and so forth. Basically anything is a string -> string map with small keys and values.
Now let’s assume the objects we want to cache are numbered, like:
- object:102393
- object:1234
- object:5
This is what we can do. Every time we perform a SET operation to set a new value, we actually split the key into two parts, one part used as a key, and the other part used as the field name for the hash. For instance the object named “object:1234” is actually split into:
- a Key named object:12
- a Field named 34
So we use all the characters but the last two for the key, and the final two characters for the hash field name. To set our key we use the following command:
HSET object:12 34 somevalue
As you can see every hash will end containing 100 fields, that is an optimal compromise between CPU and memory saved.
There is another important thing to note, with this schema every hash will have more or less 100 fields regardless of the number of objects we cached. This is since our objects will always end with a number, and not a random string. In some way the final number can be considered as a form of implicit pre-sharding.
What about small numbers? Like object:2? We handle this case using just “object:” as a key name, and the whole number as the hash field name. So object:2 and object:10 will both end inside the key “object:”, but one as field name “2” and one as “10”.
How much memory do we save this way?
I used the following Ruby program to test how this works:
require 'rubygems'
require 'redis'
USE_OPTIMIZATION = true
def hash_get_key_field(key)
s = key.split(':')
if s[1].length > 2
{ key: s[0] + ':' + s[1][0..-3], field: s[1][-2..-1] }
else
{ key: s[0] + ':', field: s[1] }
end
end
def hash_set(r, key, value)
kf = hash_get_key_field(key)
r.hset(kf[:key], kf[:field], value)
end
def hash_get(r, key, value)
kf = hash_get_key_field(key)
r.hget(kf[:key], kf[:field], value)
end
r = Redis.new
(0..100_000).each do |id|
key = "object:#{id}"
if USE_OPTIMIZATION
hash_set(r, key, 'val')
else
r.set(key, 'val')
end
end
This is the result against a 64 bit instance of Redis 2.2:
- USE_OPTIMIZATION set to true: 1.7 MB of used memory
- USE_OPTIMIZATION set to false; 11 MB of used memory
This is an order of magnitude, I think this makes Redis more or less the most memory efficient plain key value store out there.
WARNING: for this to work, make sure that in your redis.conf you have something like this:
hash-max-zipmap-entries 256
Also remember to set the following field accordingly to the maximum size of your keys and values:
hash-max-zipmap-value 1024
Every time a hash exceeds the number of elements or element size specified it will be converted into a real hash table, and the memory saving will be lost.
You may ask, why don’t you do this implicitly in the normal key space so that I don’t have to care? There are two reasons: one is that we tend to make tradeoffs explicit, and this is a clear tradeoff between many things: CPU, memory, max element size. The second is that the top level key space must support a lot of interesting things like expires, LRU data, and so forth so it is not practical to do this in a general way.
But the Redis Way is that the user must understand how things work so that he is able to pick the best compromise, and to understand how the system will behave exactly.
Memory allocation
To store user keys, Redis allocates at most as much memory as the maxmemory
setting enables (however there are small extra allocations possible).
The exact value can be set in the configuration file or set later via
CONFIG SET
(see Using memory as an LRU cache for more info).
There are a few things that should be noted about how Redis manages memory:
- Redis will not always free up (return) memory to the OS when keys are removed. This is not something special about Redis, but it is how most malloc() implementations work. For example if you fill an instance with 5GB worth of data, and then remove the equivalent of 2GB of data, the Resident Set Size (also known as the RSS, which is the number of memory pages consumed by the process) will probably still be around 5GB, even if Redis will claim that the user memory is around 3GB. This happens because the underlying allocator can’t easily release the memory. For example often most of the removed keys were allocated in the same pages as the other keys that still exist.
- The previous point means that you need to provision memory based on your peak memory usage. If your workload from time to time requires 10GB, even if most of the times 5GB could do, you need to provision for 10GB.
- However allocators are smart and are able to reuse free chunks of memory, so after you freed 2GB of your 5GB data set, when you start adding more keys again, you’ll see the RSS (Resident Set Size) stay steady and not grow more, as you add up to 2GB of additional keys. The allocator is basically trying to reuse the 2GB of memory previously (logically) freed.
- Because of all this, the fragmentation ratio is not reliable when you
had a memory usage that at peak is much larger than the currently used memory.
The fragmentation is calculated as the physical memory actually used (the RSS
value) divided by the amount of memory currently in use (as the sum of all
the allocations performed by Redis). Because the RSS reflects the peak memory,
when the (virtually) used memory is low since a lot of keys / values were
freed, but the RSS is high, the ratio
RSS / mem_used
will be very high.
If maxmemory
is not set Redis will keep allocating memory as it sees
fit and thus it can (gradually) eat up all your free memory.
Therefore it is generally advisable to configure some limit. You may also
want to set maxmemory-policy
to noeviction
(which is not the default
value in some older versions of Redis).
It makes Redis return an out of memory error for write commands if and when it reaches the limit - which in turn may result in errors in the application but will not render the whole machine dead because of memory starvation.
14 - Redis programming patterns
The following documents describe some novel development patterns you can use with Redis.
14.1 - Bulk loading
Bulk loading is the process of loading Redis with a large amount of pre-existing data. Ideally, you want to perform this operation quickly and efficiently. This document describes some strategies for bulk loading data in Redis.
Bulk loading using the Redis protocol
Using a normal Redis client to perform bulk loading is not a good idea for a few reasons: the naive approach of sending one command after the other is slow because you have to pay for the round trip time for every command. It is possible to use pipelining, but for bulk loading of many records you need to write new commands while you read replies at the same time to make sure you are inserting as fast as possible.
Only a small percentage of clients support non-blocking I/O, and not all the clients are able to parse the replies in an efficient way in order to maximize throughput. For all of these reasons the preferred way to mass import data into Redis is to generate a text file containing the Redis protocol, in raw format, in order to call the commands needed to insert the required data.
For instance if I need to generate a large data set where there are billions of keys in the form: `keyN -> ValueN' I will create a file containing the following commands in the Redis protocol format:
SET Key0 Value0
SET Key1 Value1
...
SET KeyN ValueN
Once this file is created, the remaining action is to feed it to Redis
as fast as possible. In the past the way to do this was to use the
netcat
with the following command:
(cat data.txt; sleep 10) | nc localhost 6379 > /dev/null
However this is not a very reliable way to perform mass import because netcat
does not really know when all the data was transferred and can’t check for
errors. In 2.6 or later versions of Redis the redis-cli
utility
supports a new mode called pipe mode that was designed in order to perform
bulk loading.
Using the pipe mode the command to run looks like the following:
cat data.txt | redis-cli --pipe
That will produce an output similar to this:
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 1000000
The redis-cli utility will also make sure to only redirect errors received from the Redis instance to the standard output.
Generating Redis Protocol
The Redis protocol is extremely simple to generate and parse, and is Documented here. However in order to generate protocol for the goal of bulk loading you don’t need to understand every detail of the protocol, but just that every command is represented in the following way:
*<args><cr><lf>
$<len><cr><lf>
<arg0><cr><lf>
<arg1><cr><lf>
...
<argN><cr><lf>
Where <cr>
means “\r” (or ASCII character 13) and <lf>
means “\n” (or ASCII character 10).
For instance the command SET key value is represented by the following protocol:
*3<cr><lf>
$3<cr><lf>
SET<cr><lf>
$3<cr><lf>
key<cr><lf>
$5<cr><lf>
value<cr><lf>
Or represented as a quoted string:
"*3\r\n$3\r\nSET\r\n$3\r\nkey\r\n$5\r\nvalue\r\n"
The file you need to generate for bulk loading is just composed of commands represented in the above way, one after the other.
The following Ruby function generates valid protocol:
def gen_redis_proto(*cmd)
proto = ""
proto << "*"+cmd.length.to_s+"\r\n"
cmd.each{|arg|
proto << "$"+arg.to_s.bytesize.to_s+"\r\n"
proto << arg.to_s+"\r\n"
}
proto
end
puts gen_redis_proto("SET","mykey","Hello World!").inspect
Using the above function it is possible to easily generate the key value pairs in the above example, with this program:
(0...1000).each{|n|
STDOUT.write(gen_redis_proto("SET","Key#{n}","Value#{n}"))
}
We can run the program directly in pipe to redis-cli in order to perform our first mass import session.
$ ruby proto.rb | redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 1000
How the pipe mode works under the hood
The magic needed inside the pipe mode of redis-cli is to be as fast as netcat and still be able to understand when the last reply was sent by the server at the same time.
This is obtained in the following way:
- redis-cli –pipe tries to send data as fast as possible to the server.
- At the same time it reads data when available, trying to parse it.
- Once there is no more data to read from stdin, it sends a special ECHO command with a random 20 byte string: we are sure this is the latest command sent, and we are sure we can match the reply checking if we receive the same 20 bytes as a bulk reply.
- Once this special final command is sent, the code receiving replies starts to match replies with these 20 bytes. When the matching reply is reached it can exit with success.
Using this trick we don’t need to parse the protocol we send to the server in order to understand how many commands we are sending, but just the replies.
However while parsing the replies we take a counter of all the replies parsed so that at the end we are able to tell the user the amount of commands transferred to the server by the mass insert session.
14.2 - Distributed Locks with Redis
Distributed locks are a very useful primitive in many environments where different processes must operate with shared resources in a mutually exclusive way.
There are a number of libraries and blog posts describing how to implement a DLM (Distributed Lock Manager) with Redis, but every library uses a different approach, and many use a simple approach with lower guarantees compared to what can be achieved with slightly more complex designs.
This page desceibes a more canonical algorithm to implement distributed locks with Redis. We propose an algorithm, called Redlock, which implements a DLM which we believe to be safer than the vanilla single instance approach. We hope that the community will analyze it, provide feedback, and use it as a starting point for the implementations or more complex or alternative designs.
Implementations
Before describing the algorithm, here are a few links to implementations already available that can be used for reference.
- Redlock-rb (Ruby implementation). There is also a fork of Redlock-rb that adds a gem for easy distribution.
- Redlock-py (Python implementation).
- Pottery (Python implementation).
- Aioredlock (Asyncio Python implementation).
- Redlock-php (PHP implementation).
- PHPRedisMutex (further PHP implementation).
- cheprasov/php-redis-lock (PHP library for locks).
- rtckit/react-redlock (Async PHP implementation).
- Redsync (Go implementation).
- Redisson (Java implementation).
- Redis::DistLock (Perl implementation).
- Redlock-cpp (C++ implementation).
- Redlock-cs (C#/.NET implementation).
- RedLock.net (C#/.NET implementation). Includes async and lock extension support.
- ScarletLock (C# .NET implementation with configurable datastore).
- Redlock4Net (C# .NET implementation).
- node-redlock (NodeJS implementation). Includes support for lock extension.
Safety and Liveness Guarantees
We are going to model our design with just three properties that, from our point of view, are the minimum guarantees needed to use distributed locks in an effective way.
- Safety property: Mutual exclusion. At any given moment, only one client can hold a lock.
- Liveness property A: Deadlock free. Eventually it is always possible to acquire a lock, even if the client that locked a resource crashes or gets partitioned.
- Liveness property B: Fault tolerance. As long as the majority of Redis nodes are up, clients are able to acquire and release locks.
Why Failover-based Implementations Are Not Enough
To understand what we want to improve, let’s analyze the current state of affairs with most Redis-based distributed lock libraries.
The simplest way to use Redis to lock a resource is to create a key in an instance. The key is usually created with a limited time to live, using the Redis expires feature, so that eventually it will get released (property 2 in our list). When the client needs to release the resource, it deletes the key.
Superficially this works well, but there is a problem: this is a single point of failure in our architecture. What happens if the Redis master goes down? Well, let’s add a replica! And use it if the master is unavailable. This is unfortunately not viable. By doing so we can’t implement our safety property of mutual exclusion, because Redis replication is asynchronous.
There is a race condition with this model:
- Client A acquires the lock in the master.
- The master crashes before the write to the key is transmitted to the replica.
- The replica gets promoted to master.
- Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION!
Sometimes it is perfectly fine that, under special circumstances, for example during a failure, multiple clients can hold the lock at the same time. If this is the case, you can use your replication based solution. Otherwise we suggest to implement the solution described in this document.
Correct Implementation with a Single Instance
Before trying to overcome the limitation of the single instance setup described above, let’s check how to do it correctly in this simple case, since this is actually a viable solution in applications where a race condition from time to time is acceptable, and because locking into a single instance is the foundation we’ll use for the distributed algorithm described here.
To acquire the lock, the way to go is the following:
SET resource_name my_random_value NX PX 30000
The command will set the key only if it does not already exist (NX
option), with an expire of 30000 milliseconds (PX
option).
The key is set to a value “my_random_value”. This value must be unique across all clients and all lock requests.
Basically the random value is used in order to release the lock in a safe way, with a script that tells Redis: remove the key only if it exists and the value stored at the key is exactly the one I expect to be. This is accomplished by the following Lua script:
if redis.call("get",KEYS[1]) == ARGV[1] then
return redis.call("del",KEYS[1])
else
return 0
end
This is important in order to avoid removing a lock that was created by another client. For example a client may acquire the lock, get blocked performing some operation for longer than the lock validity time (the time at which the key will expire), and later remove the lock, that was already acquired by some other client.
Using just DEL
is not safe as a client may remove another client’s lock. With the above script instead every lock is “signed” with a random string, so the lock will be removed only if it is still the one that was set by the client trying to remove it.
What should this random string be? We assume it’s 20 bytes from /dev/urandom
, but you can find cheaper ways to make it unique enough for your tasks.
For example a safe pick is to seed RC4 with /dev/urandom
, and generate a pseudo random stream from that.
A simpler solution is to use a UNIX timestamp with microsecond precision, concatenating the timestamp with a client ID. It is not as safe, but probably sufficient for most environments.
The “lock validity time” is the time we use as the key’s time to live. It is both the auto release time, and the time the client has in order to perform the operation required before another client may be able to acquire the lock again, without technically violating the mutual exclusion guarantee, which is only limited to a given window of time from the moment the lock is acquired.
So now we have a good way to acquire and release the lock. With this system, reasoning about a non-distributed system composed of a single, always available, instance, is safe. Let’s extend the concept to a distributed system where we don’t have such guarantees.
The Redlock Algorithm
In the distributed version of the algorithm we assume we have N Redis masters. Those nodes are totally independent, so we don’t use replication or any other implicit coordination system. We already described how to acquire and release the lock safely in a single instance. We take for granted that the algorithm will use this method to acquire and release the lock in a single instance. In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way.
In order to acquire the lock, the client performs the following operations:
- It gets the current time in milliseconds.
- It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances. During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it. For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range. This prevents the client from remaining blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP.
- The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.
- If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3.
- If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).
Is the Algorithm Asynchronous?
The algorithm relies on the assumption that while there is no synchronized clock across the processes, the local time in every process updates at approximately at the same rate, with a small margin of error compared to the auto-release time of the lock. This assumption closely resembles a real-world computer: every computer has a local clock and we can usually rely on different computers to have a clock drift which is small.
At this point we need to better specify our mutual exclusion rule: it is guaranteed only as long as the client holding the lock terminates its work within the lock validity time (as obtained in step 3), minus some time (just a few milliseconds in order to compensate for clock drift between processes).
This paper contains more information about similar systems requiring a bound clock drift: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.
Retry on Failure
When a client is unable to acquire the lock, it should try again after a random delay in order to try to desynchronize multiple clients trying to acquire the lock for the same resource at the same time (this may result in a split brain condition where nobody wins). Also the faster a client tries to acquire the lock in the majority of Redis instances, the smaller the window for a split brain condition (and the need for a retry), so ideally the client should try to send the SET
commands to the N instances at the same time using multiplexing.
It is worth stressing how important it is for clients that fail to acquire the majority of locks, to release the (partially) acquired locks ASAP, so that there is no need to wait for key expiry in order for the lock to be acquired again (however if a network partition happens and the client is no longer able to communicate with the Redis instances, there is an availability penalty to pay as it waits for key expiration).
Releasing the Lock
Releasing the lock is simple, and can be performed whether or not the client believes it was able to successfully lock a given instance.
Safety Arguments
Is the algorithm safe? Let’s examine what happens in different scenarios.
To start let’s assume that a client is able to acquire the lock in the majority of instances. All the instances will contain a key with the same time to live. However, the key was set at different times, so the keys will also expire at different times. But if the first key was set at worst at time T1 (the time we sample before contacting the first server) and the last key was set at worst at time T2 (the time we obtained the reply from the last server), we are sure that the first key to expire in the set will exist for at least MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT
. All the other keys will expire later, so we are sure that the keys will be simultaneously set for at least this time.
During the time that the majority of keys are set, another client will not be able to acquire the lock, since N/2+1 SET NX operations can’t succeed if N/2+1 keys already exist. So if a lock was acquired, it is not possible to re-acquire it at the same time (violating the mutual exclusion property).
However we want to also make sure that multiple clients trying to acquire the lock at the same time can’t simultaneously succeed.
If a client locked the majority of instances using a time near, or greater, than the lock maximum validity time (the TTL we use for SET basically), it will consider the lock invalid and will unlock the instances, so we only need to consider the case where a client was able to lock the majority of instances in a time which is less than the validity time. In this case for the argument already expressed above, for MIN_VALIDITY
no client should be able to re-acquire the lock. So multiple clients will be able to lock N/2+1 instances at the same time (with “time” being the end of Step 2) only when the time to lock the majority was greater than the TTL time, making the lock invalid.
Liveness Arguments
The system liveness is based on three main features:
- The auto release of the lock (since keys expire): eventually keys are available again to be locked.
- The fact that clients, usually, will cooperate removing the locks when the lock was not acquired, or when the lock was acquired and the work terminated, making it likely that we don’t have to wait for keys to expire to re-acquire the lock.
- The fact that when a client needs to retry a lock, it waits a time which is comparably greater than the time needed to acquire the majority of locks, in order to probabilistically make split brain conditions during resource contention unlikely.
However, we pay an availability penalty equal to TTL
time on network partitions, so if there are continuous partitions, we can pay this penalty indefinitely.
This happens every time a client acquires a lock and gets partitioned away before being able to remove the lock.
Basically if there are infinite continuous network partitions, the system may become not available for an infinite amount of time.
Performance, Crash Recovery and fsync
Many users using Redis as a lock server need high performance in terms of both latency to acquire and release a lock, and number of acquire / release operations that it is possible to perform per second. In order to meet this requirement, the strategy to talk with the N Redis servers to reduce latency is definitely multiplexing (putting the socket in non-blocking mode, send all the commands, and read all the commands later, assuming that the RTT between the client and each instance is similar).
However there is another consideration around persistence if we want to target a crash-recovery system model.
Basically to see the problem here, let’s assume we configure Redis without persistence at all. A client acquires the lock in 3 of 5 instances. One of the instances where the client was able to acquire the lock is restarted, at this point there are again 3 instances that we can lock for the same resource, and another client can lock it again, violating the safety property of exclusivity of lock.
If we enable AOF persistence, things will improve quite a bit. For example we can upgrade a server by sending it a SHUTDOWN
command and restarting it. Because Redis expires are semantically implemented so that time still elapses when the server is off, all our requirements are fine.
However everything is fine as long as it is a clean shutdown. What about a power outage? If Redis is configured, as by default, to fsync on disk every second, it is possible that after a restart our key is missing. In theory, if we want to guarantee the lock safety in the face of any kind of instance restart, we need to enable fsync=always
in the persistence settings. This will affect performance due to the additional sync overhead.
However things are better than they look like at a first glance. Basically, the algorithm safety is retained as long as when an instance restarts after a crash, it no longer participates to any currently active lock. This means that the set of currently active locks when the instance restarts were all obtained by locking instances other than the one which is rejoining the system.
To guarantee this we just need to make an instance, after a crash, unavailable
for at least a bit more than the max TTL
we use. This is the time needed
for all the keys about the locks that existed when the instance crashed to
become invalid and be automatically released.
Using delayed restarts it is basically possible to achieve safety even
without any kind of Redis persistence available, however note that this may
translate into an availability penalty. For example if a majority of instances
crash, the system will become globally unavailable for TTL
(here globally means
that no resource at all will be lockable during this time).
Making the algorithm more reliable: Extending the lock
If the work performed by clients consists of small steps, it is possible to use smaller lock validity times by default, and extend the algorithm implementing a lock extension mechanism. Basically the client, if in the middle of the computation while the lock validity is approaching a low value, may extend the lock by sending a Lua script to all the instances that extends the TTL of the key if the key exists and its value is still the random value the client assigned when the lock was acquired.
The client should only consider the lock re-acquired if it was able to extend the lock into the majority of instances, and within the validity time (basically the algorithm to use is very similar to the one used when acquiring the lock).
However this does not technically change the algorithm, so the maximum number of lock reacquisition attempts should be limited, otherwise one of the liveness properties is violated.
Want to help?
If you are into distributed systems, it would be great to have your opinion / analysis. Also reference implementations in other languages could be great.
Thanks in advance!
Analysis of Redlock
- Martin Kleppmann analyzed Redlock here. A counterpoint to this analysis can be found here.
14.3 - Secondary indexing
Redis is not exactly a key-value store, since values can be complex data structures. However it has an external key-value shell: at API level data is addressed by the key name. It is fair to say that, natively, Redis only offers primary key access. However since Redis is a data structures server, its capabilities can be used for indexing, in order to create secondary indexes of different kinds, including composite (multi-column) indexes.
This document explains how it is possible to create indexes in Redis using the following data structures:
- Sorted sets to create secondary indexes by ID or other numerical fields.
- Sorted sets with lexicographical ranges for creating more advanced secondary indexes, composite indexes and graph traversal indexes.
- Sets for creating random indexes.
- Lists for creating simple iterable indexes and last N items indexes.
Implementing and maintaining indexes with Redis is an advanced topic, so most users that need to perform complex queries on data should understand if they are better served by a relational store. However often, especially in caching scenarios, there is the explicit need to store indexed data into Redis in order to speedup common queries which require some form of indexing in order to be executed.
Simple numerical indexes with sorted sets
The simplest secondary index you can create with Redis is by using the sorted set data type, which is a data structure representing a set of elements ordered by a floating point number which is the score of each element. Elements are ordered from the smallest to the highest score.
Since the score is a double precision float, indexes you can build with vanilla sorted sets are limited to things where the indexing field is a number within a given range.
The two commands to build these kind of indexes are ZADD
and
ZRANGEBYSCORE
to respectively add items and retrieve items within a
specified range.
For instance, it is possible to index a set of person names by their age by adding element to a sorted set. The element will be the name of the person and the score will be the age.
ZADD myindex 25 Manuel
ZADD myindex 18 Anna
ZADD myindex 35 Jon
ZADD myindex 67 Helen
In order to retrieve all persons with an age between 20 and 40, the following command can be used:
ZRANGEBYSCORE myindex 20 40
1) "Manuel"
2) "Jon"
By using the WITHSCORES option of ZRANGEBYSCORE
it is also possible
to obtain the scores associated with the returned elements.
The ZCOUNT
command can be used in order to retrieve the number of elements
within a given range, without actually fetching the elements, which is also
useful, especially given the fact the operation is executed in logarithmic
time regardless of the size of the range.
Ranges can be inclusive or exclusive, please refer to the ZRANGEBYSCORE
command documentation for more information.
Note: Using the ZREVRANGEBYSCORE
it is possible to query a range in
reversed order, which is often useful when data is indexed in a given
direction (ascending or descending) but we want to retrieve information
the other way around.
Using objects IDs as associated values
In the above example we associated names to ages. However in general we may want to index some field of an object which is stored elsewhere. Instead of using the sorted set value directly to store the data associated with the indexed field, it is possible to store just the ID of the object.
For example I may have Redis hashes representing users. Each user is represented by a single key, directly accessible by ID:
HMSET user:1 id 1 username antirez ctime 1444809424 age 38
HMSET user:2 id 2 username maria ctime 1444808132 age 42
HMSET user:3 id 3 username jballard ctime 1443246218 age 33
If I want to create an index in order to query users by their age, I could do:
ZADD user.age.index 38 1
ZADD user.age.index 42 2
ZADD user.age.index 33 3
This time the value associated with the score in the sorted set is the
ID of the object. So once I query the index with ZRANGEBYSCORE
I’ll
also have to retrieve the information I need with HGETALL
or similar
commands. The obvious advantage is that objects can change without touching
the index, as long as we don’t change the indexed field.
In the next examples we’ll almost always use IDs as values associated with the index, since this is usually the more sounding design, with a few exceptions.
Updating simple sorted set indexes
Often we index things which change over time. In the above example, the age of the user changes every year. In such a case it would make sense to use the birth date as index instead of the age itself, but there are other cases where we simply want some field to change from time to time, and the index to reflect this change.
The ZADD
command makes updating simple indexes a very trivial operation
since re-adding back an element with a different score and the same value
will simply update the score and move the element at the right position,
so if the user antirez
turned 39 years old, in order to update the
data in the hash representing the user, and in the index as well, we need
to execute the following two commands:
HSET user:1 age 39
ZADD user.age.index 39 1
The operation may be wrapped in a MULTI
/EXEC
transaction in order to
make sure both fields are updated or none.
Turning multi dimensional data into linear data
Indexes created with sorted sets are able to index only a single numerical value. Because of this you may think it is impossible to index something which has multiple dimensions using this kind of indexes, but actually this is not always true. If you can efficiently represent something multi-dimensional in a linear way, they it is often possible to use a simple sorted set for indexing.
For example the Redis geo indexing API uses a sorted set to index places by latitude and longitude using a technique called Geo hash. The sorted set score represents alternating bits of longitude and latitude, so that we map the linear score of a sorted set to many small squares in the earth surface. By doing an 8+1 style center plus neighborhoods search it is possible to retrieve elements by radius.
Limits of the score
Sorted set elements scores are double precision floats. It means that
they can represent different decimal or integer values with different
errors, because they use an exponential representation internally.
However what is interesting for indexing purposes is that the score is
always able to represent without any error numbers between -9007199254740992
and 9007199254740992, which is -/+ 2^53
.
When representing much larger numbers, you need a different form of indexing that is able to index numbers at any precision, called a lexicographical index.
Lexicographical indexes
Redis sorted sets have an interesting property. When elements are added
with the same score, they are sorted lexicographically, comparing the
strings as binary data with the memcmp()
function.
For people that don’t know the C language nor the memcmp
function, what
it means is that elements with the same score are sorted comparing the
raw values of their bytes, byte after byte. If the first byte is the same,
the second is checked and so forth. If the common prefix of two strings is
the same then the longer string is considered the greater of the two,
so “foobar” is greater than “foo”.
There are commands such as ZRANGEBYLEX
and ZLEXCOUNT
that
are able to query and count ranges in a lexicographically fashion, assuming
they are used with sorted sets where all the elements have the same score.
This Redis feature is basically equivalent to a b-tree
data structure which
is often used in order to implement indexes with traditional databases.
As you can guess, because of this, it is possible to use this Redis data
structure in order to implement pretty fancy indexes.
Before we dive into using lexicographical indexes, let’s check how sorted sets behave in this special mode of operation. Since we need to add elements with the same score, we’ll always use the special score of zero.
ZADD myindex 0 baaa
ZADD myindex 0 abbb
ZADD myindex 0 aaaa
ZADD myindex 0 bbbb
Fetching all the elements from the sorted set immediately reveals that they are ordered lexicographically.
ZRANGE myindex 0 -1
1) "aaaa"
2) "abbb"
3) "baaa"
4) "bbbb"
Now we can use ZRANGEBYLEX
in order to perform range queries.
ZRANGEBYLEX myindex [a (b
1) "aaaa"
2) "abbb"
Note that in the range queries we prefixed the min
and max
elements
identifying the range with the special characters [
and (
.
This prefixes are mandatory, and they specify if the elements
of the range are inclusive or exclusive. So the range [a (b
means give me
all the elements lexicographically between a
inclusive and b
exclusive,
which are all the elements starting with a
.
There are also two more special characters indicating the infinitely negative
string and the infinitely positive string, which are -
and +
.
ZRANGEBYLEX myindex [b +
1) "baaa"
2) "bbbb"
That’s it basically. Let’s see how to use these features to build indexes.
A first example: completion
An interesting application of indexing is completion. Completion is what happens when you start typing your query into a search engine: the user interface will anticipate what you are likely typing, providing common queries that start with the same characters.
A naive approach to completion is to just add every single query we
get from the user into the index. For example if the user searches banana
we’ll just do:
ZADD myindex 0 banana
And so forth for each search query ever encountered. Then when we want to
complete the user input, we execute a range query using ZRANGEBYLEX
.
Imagine the user is typing “bit” inside the search form, and we want to
offer possible search keywords starting for “bit”. We send Redis a command
like that:
ZRANGEBYLEX myindex "[bit" "[bit\xff"
Basically we create a range using the string the user is typing right now
as start, and the same string plus a trailing byte set to 255, which is \xff
in the example, as the end of the range. This way we get all the strings that start for the string the user is typing.
Note that we don’t want too many items returned, so we may use the LIMIT option in order to reduce the number of results.
Adding frequency into the mix
The above approach is a bit naive, because all the user searches are the same in this way. In a real system we want to complete strings according to their frequency: very popular searches will be proposed with an higher probability compared to search strings typed very rarely.
In order to implement something which depends on the frequency, and at the same time automatically adapts to future inputs, by purging searches that are no longer popular, we can use a very simple streaming algorithm.
To start, we modify our index in order to store not just the search term,
but also the frequency the term is associated with. So instead of just adding
banana
we add banana:1
, where 1 is the frequency.
ZADD myindex 0 banana:1
We also need logic in order to increment the index if the search term already exists in the index, so what we’ll actually do is something like that:
ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1
1) "banana:1"
This will return the single entry of banana
if it exists. Then we
can increment the associated frequency and send the following two
commands:
ZREM myindex 0 banana:1
ZADD myindex 0 banana:2
Note that because it is possible that there are concurrent updates, the above three commands should be send via a Lua script instead, so that the Lua script will atomically get the old count and re-add the item with incremented score.
So the result will be that, every time a user searches for banana
we’ll
get our entry updated.
There is more: our goal is to just have items searched very frequently. So we need some form of purging. When we actually query the index in order to complete the user input, we may see something like that:
ZRANGEBYLEX myindex "[banana:" + LIMIT 0 10
1) "banana:123"
2) "banaooo:1"
3) "banned user:49"
4) "banning:89"
Apparently nobody searches for “banaooo”, for example, but the query was performed a single time, so we end presenting it to the user.
This is what we can do. Out of the returned items, we pick a random one, decrement its score by one, and re-add it with the new score. However if the score reaches 0, we simply remove the item from the list. You can use much more advanced systems, but the idea is that the index in the long run will contain top searches, and if top searches will change over the time it will adapt automatically.
A refinement to this algorithm is to pick entries in the list according to their weight: the higher the score, the less likely entries are picked in order to decrement its score, or evict them.
Normalizing strings for case and accents
In the completion examples we always used lowercase strings. However reality is much more complex than that: languages have capitalized names, accents, and so forth.
One simple way do deal with this issues is to actually normalize the string the user searches. Whatever the user searches for “Banana”, “BANANA” or “Ba’nana” we may always turn it into “banana”.
However sometimes we may like to present the user with the original
item typed, even if we normalize the string for indexing. In order to
do this, what we do is to change the format of the index so that instead
of just storing term:frequency
we store normalized:frequency:original
like in the following example:
ZADD myindex 0 banana:273:Banana
Basically we add another field that we’ll extract and use only for visualization. Ranges will always be computed using the normalized strings instead. This is a common trick which has multiple applications.
Adding auxiliary information in the index
When using a sorted set in a direct way, we have two different attributes for each object: the score, which we use as an index, and an associated value. When using lexicographical indexes instead, the score is always set to 0 and basically not used at all. We are left with a single string, which is the element itself.
Like we did in the previous completion examples, we are still able to store associated data using separators. For example we used the colon in order to add the frequency and the original word for completion.
In general we can add any kind of associated value to our indexing key.
In order to use a lexicographical index to implement a simple key-value store
we just store the entry as key:value
:
ZADD myindex 0 mykey:myvalue
And search for the key with:
ZRANGEBYLEX myindex [mykey: + LIMIT 0 1
1) "mykey:myvalue"
Then we extract the part after the colon to retrieve the value. However a problem to solve in this case is collisions. The colon character may be part of the key itself, so it must be chosen in order to never collide with the key we add.
Since lexicographical ranges in Redis are binary safe you can use any byte or any sequence of bytes. However if you receive untrusted user input, it is better to use some form of escaping in order to guarantee that the separator will never happen to be part of the key.
For example if you use two null bytes as separator "\0\0"
, you may
want to always escape null bytes into two bytes sequences in your strings.
Numerical padding
Lexicographical indexes may look like good only when the problem at hand is to index strings. Actually it is very simple to use this kind of index in order to perform indexing of arbitrary precision numbers.
In the ASCII character set, digits appear in the order from 0 to 9, so if we left-pad numbers with leading zeroes, the result is that comparing them as strings will order them by their numerical value.
ZADD myindex 0 00324823481:foo
ZADD myindex 0 12838349234:bar
ZADD myindex 0 00000000111:zap
ZRANGE myindex 0 -1
1) "00000000111:zap"
2) "00324823481:foo"
3) "12838349234:bar"
We effectively created an index using a numerical field which can be as big as we want. This also works with floating point numbers of any precision by making sure we left pad the numerical part with leading zeroes and the decimal part with trailing zeroes like in the following list of numbers:
01000000000000.11000000000000
01000000000000.02200000000000
00000002121241.34893482930000
00999999999999.00000000000000
Using numbers in binary form
Storing numbers in decimal may use too much memory. An alternative approach
is just to store numbers, for example 128 bit integers, directly in their
binary form. However for this to work, you need to store the numbers in
big endian format, so that the most significant bytes are stored before
the least significant bytes. This way when Redis compares the strings with
memcmp()
, it will effectively sort the numbers by their value.
Keep in mind that data stored in binary format is less observable for debugging, harder to parse and export. So it is definitely a trade off.
Composite indexes
So far we explored ways to index single fields. However we all know that SQL stores are able to create indexes using multiple fields. For example I may index products in a very large store by room number and price.
I need to run queries in order to retrieve all the products in a given room having a given price range. What I can do is to index each product in the following way:
ZADD myindex 0 0056:0028.44:90
ZADD myindex 0 0034:0011.00:832
Here the fields are room:price:product_id
. I used just four digits padding
in the example for simplicity. The auxiliary data (the product ID) does not
need any padding.
With an index like that, to get all the products in room 56 having a price between 10 and 30 dollars is very easy. We can just run the following command:
ZRANGEBYLEX myindex [0056:0010.00 [0056:0030.00
The above is called a composed index. Its effectiveness depends on the order of the fields and the queries I want to run. For example the above index cannot be used efficiently in order to get all the products having a specific price range regardless of the room number. However I can use the primary key in order to run queries regardless of the price, like give me all the products in room 44.
Composite indexes are very powerful, and are used in traditional stores in order to optimize complex queries. In Redis they could be useful both to implement a very fast in-memory Redis index of something stored into a traditional data store, or in order to directly index Redis data.
Updating lexicographical indexes
The value of the index in a lexicographical index can get pretty fancy and hard or slow to rebuild from what we store about the object. So one approach to simplify the handling of the index, at the cost of using more memory, is to also take alongside to the sorted set representing the index a hash mapping the object ID to the current index value.
So for example, when we index we also add to a hash:
MULTI
ZADD myindex 0 0056:0028.44:90
HSET index.content 90 0056:0028.44:90
EXEC
This is not always needed, but simplifies the operations of updating
the index. In order to remove the old information we indexed for the object
ID 90, regardless of the current fields values of the object, we just
have to retrieve the hash value by object ID and ZREM
it in the sorted
set view.
Representing and querying graphs using an hexastore
One cool thing about composite indexes is that they are handy in order to represent graphs, using a data structure which is called Hexastore.
The hexastore provides a representation for relations between objects, formed by a subject, a predicate and an object. A simple relation between objects could be:
antirez is-friend-of matteocollina
In order to represent this relation I can store the following element in my lexicographical index:
ZADD myindex 0 spo:antirez:is-friend-of:matteocollina
Note that I prefixed my item with the string spo. It means that the item represents a subject,predicate,object relation.
In can add 5 more entries for the same relation, but in a different order:
ZADD myindex 0 sop:antirez:matteocollina:is-friend-of
ZADD myindex 0 ops:matteocollina:is-friend-of:antirez
ZADD myindex 0 osp:matteocollina:antirez:is-friend-of
ZADD myindex 0 pso:is-friend-of:antirez:matteocollina
ZADD myindex 0 pos:is-friend-of:matteocollina:antirez
Now things start to be interesting, and I can query the graph in many
different ways. For example, who are all the people antirez
is friend of?
ZRANGEBYLEX myindex "[spo:antirez:is-friend-of:" "[spo:antirez:is-friend-of:\xff"
1) "spo:antirez:is-friend-of:matteocollina"
2) "spo:antirez:is-friend-of:wonderwoman"
3) "spo:antirez:is-friend-of:spiderman"
Or, what are all the relationships antirez
and matteocollina
have where
the first is the subject and the second is the object?
ZRANGEBYLEX myindex "[sop:antirez:matteocollina:" "[sop:antirez:matteocollina:\xff"
1) "sop:antirez:matteocollina:is-friend-of"
2) "sop:antirez:matteocollina:was-at-conference-with"
3) "sop:antirez:matteocollina:talked-with"
By combining different queries, I can ask fancy questions. For example:
Who are all my friends that, like beer, live in Barcelona, and matteocollina consider friends as well?
To get this information I start with an spo
query to find all the people
I’m friend with. Then for each result I get I perform an spo
query
to check if they like beer, removing the ones for which I can’t find
this relation. I do it again to filter by city. Finally I perform an ops
query to find, of the list I obtained, who is considered friend by
matteocollina.
Make sure to check Matteo Collina’s slides about Levelgraph in order to better understand these ideas.
Multi dimensional indexes
A more complex type of index is an index that allows you to perform queries where two or more variables are queried at the same time for specific ranges. For example I may have a data set representing persons age and salary, and I want to retrieve all the people between 50 and 55 years old having a salary between 70000 and 85000.
This query may be performed with a multi column index, but this requires us to select the first variable and then scan the second, which means we may do a lot more work than needed. It is possible to perform these kinds of queries involving multiple variables using different data structures. For example, multi-dimensional trees such as k-d trees or r-trees are sometimes used. Here we’ll describe a different way to index data into multiple dimensions, using a representation trick that allows us to perform the query in a very efficient way using Redis lexicographical ranges.
Let’s start by visualizing the problem. In this picture we have points
in the space, which represent our data samples, where x
and y
are
our coordinates. Both variables max value is 400.
The blue box in the picture represents our query. We want all the points
where x
is between 50 and 100, and where y
is between 100 and 300.
In order to represent data that makes these kinds of queries fast to perform, we start by padding our numbers with 0. So for example imagine we want to add the point 10,25 (x,y) to our index. Given that the maximum range in the example is 400 we can just pad to three digits, so we obtain:
x = 010
y = 025
Now what we do is to interleave the digits, taking the leftmost digit in x, and the leftmost digit in y, and so forth, in order to create a single number:
001205
This is our index, however in order to more easily reconstruct the original representation, if we want (at the cost of space), we may also add the original values as additional columns:
001205:10:25
Now, let’s reason about this representation and why it is useful in the
context of range queries. For example let’s take the center of our blue
box, which is at x=75
and y=200
. We can encode this number as we did
earlier by interleaving the digits, obtaining:
027050
What happens if we substitute the last two digits respectively with 00 and 99? We obtain a range which is lexicographically continuous:
027000 to 027099
What this maps to is to a square representing all values where the x
variable is between 70 and 79, and the y
variable is between 200 and 209.
We can write random points in this interval, in order to identify this
specific area:
So the above lexicographic query allows us to easily query for points in a specific square in the picture. However the square may be too small for the box we are searching, so that too many queries are needed. So we can do the same but instead of replacing the last two digits with 00 and 99, we can do it for the last four digits, obtaining the following range:
020000 029999
This time the range represents all the points where x
is between 0 and 99
and y
is between 200 and 299. Drawing random points in this interval
shows us this larger area:
Oops now our area is ways too big for our query, and still our search box is not completely included. We need more granularity, but we can easily obtain it by representing our numbers in binary form. This time, when we replace digits instead of getting squares which are ten times bigger, we get squares which are just two times bigger.
Our numbers in binary form, assuming we need just 9 bits for each variable (in order to represent numbers up to 400 in value) would be:
x = 75 -> 001001011
y = 200 -> 011001000
So by interleaving digits, our representation in the index would be:
000111000011001010:75:200
Let’s see what are our ranges as we substitute the last 2, 4, 6, 8, … bits with 0s ad 1s in the interleaved representation:
2 bits: x between 70 and 75, y between 200 and 201 (range=2)
4 bits: x between 72 and 75, y between 200 and 203 (range=4)
6 bits: x between 72 and 79, y between 200 and 207 (range=8)
8 bits: x between 64 and 79, y between 192 and 207 (range=16)
And so forth. Now we have definitely better granularity!
As you can see substituting N bits from the index gives us
search boxes of side 2^(N/2)
.
So what we do is check the dimension where our search box is smaller, and check the nearest power of two to this number. Our search box was 50,100 to 100,300, so it has a width of 50 and an height of 200. We take the smaller of the two, 50, and check the nearest power of two which is 64. 64 is 2^6, so we would work with indexes obtained replacing the latest 12 bits from the interleaved representation (so that we end replacing just 6 bits of each variable).
However single squares may not cover all our search, so we may need more. What we do is to start with the left bottom corner of our search box, which is 50,100, and find the first range by substituting the last 6 bits in each number with 0. Then we do the same with the right top corner.
With two trivial nested for loops where we increment only the significant bits, we can find all the squares between these two. For each square we convert the two numbers into our interleaved representation, and create the range using the converted representation as our start, and the same representation but with the latest 12 bits turned on as end range.
For each square found we perform our query and get the elements inside, removing the elements which are outside our search box.
Turning this into code is simple. Here is a Ruby example:
def spacequery(x0,y0,x1,y1,exp)
bits=exp*2
x_start = x0/(2**exp)
x_end = x1/(2**exp)
y_start = y0/(2**exp)
y_end = y1/(2**exp)
(x_start..x_end).each{|x|
(y_start..y_end).each{|y|
x_range_start = x*(2**exp)
x_range_end = x_range_start | ((2**exp)-1)
y_range_start = y*(2**exp)
y_range_end = y_range_start | ((2**exp)-1)
puts "#{x},#{y} x from #{x_range_start} to #{x_range_end}, y from #{y_range_start} to #{y_range_end}"
# Turn it into interleaved form for ZRANGEBYLEX query.
# We assume we need 9 bits for each integer, so the final
# interleaved representation will be 18 bits.
xbin = x_range_start.to_s(2).rjust(9,'0')
ybin = y_range_start.to_s(2).rjust(9,'0')
s = xbin.split("").zip(ybin.split("")).flatten.compact.join("")
# Now that we have the start of the range, calculate the end
# by replacing the specified number of bits from 0 to 1.
e = s[0..-(bits+1)]+("1"*bits)
puts "ZRANGEBYLEX myindex [#{s} [#{e}"
}
}
end
spacequery(50,100,100,300,6)
While non immediately trivial this is a very useful indexing strategy that in the future may be implemented in Redis in a native way. For now, the good thing is that the complexity may be easily encapsulated inside a library that can be used in order to perform indexing and queries. One example of such library is Redimension, a proof of concept Ruby library which indexes N-dimensional data inside Redis using the technique described here.
Multi dimensional indexes with negative or floating point numbers
The simplest way to represent negative values is just to work with unsigned integers and represent them using an offset, so that when you index, before translating numbers in the indexed representation, you add the absolute value of your smaller negative integer.
For floating point numbers, the simplest approach is probably to convert them to integers by multiplying the integer for a power of ten proportional to the number of digits after the dot you want to retain.
Non range indexes
So far we checked indexes which are useful to query by range or by single item. However other Redis data structures such as Sets or Lists can be used in order to build other kind of indexes. They are very commonly used but maybe we don’t always realize they are actually a form of indexing.
For instance I can index object IDs into a Set data type in order to use
the get random elements operation via SRANDMEMBER
in order to retrieve
a set of random objects. Sets can also be used to check for existence when
all I need is to test if a given item exists or not or has a single boolean
property or not.
Similarly lists can be used in order to index items into a fixed order.
I can add all my items into a Redis list and rotate the list with
RPOPLPUSH
using the same key name as source and destination. This is useful
when I want to process a given set of items again and again forever in the
same order. Think of an RSS feed system that needs to refresh the local copy
periodically.
Another popular index often used with Redis is a capped list, where items
are added with LPUSH
and trimmed with LTRIM
, in order to create a view
with just the latest N items encountered, in the same order they were
seen.
Index inconsistency
Keeping the index updated may be challenging, in the course of months or years it is possible that inconsistencies are added because of software bugs, network partitions or other events.
Different strategies could be used. If the index data is outside Redis
read repair can be a solution, where data is fixed in a lazy way when
it is requested. When we index data which is stored in Redis itself
the SCAN
family of commands can be used in order to verify, update or
rebuild the index from scratch, incrementally.
14.4 - Redis patterns example
This article describes the design and implementation of a very simple Twitter clone written using PHP with Redis as the only database. The programming community has traditionally considered key-value stores as a special purpose database that couldn’t be used as a drop-in replacement for a relational database for the development of web applications. This article will try to show that Redis data structures on top of a key-value layer are an effective data model to implement many kinds of applications.
Before continuing, you may want to spend a few seconds playing with the Retwis online demo, to check what we are going to actually model. Long story short: it is a toy, but complex enough to be a foundation in order to learn how to create more complex applications.
Note: the original version of this article was written in 2009 when Redis was released. It was not exactly clear at that time that the Redis data model was suitable to write entire applications. Now after 5 years there are many cases of applications using Redis as their main store, so the goal of the article today is to be a tutorial for Redis newcomers. You’ll learn how to design a simple data layout using Redis, and how to apply different data structures.
Our Twitter clone, called Retwis, is structurally simple, has very good performance, and can be distributed among any number of web and Redis servers with little efforts. You can find the source code here.
I used PHP for the example since it can be read by everybody. The same (or better) results can be obtained using Ruby, Python, Erlang, and so on. A few clones exist (however not all the clones use the same data layout as the current version of this tutorial, so please, stick with the official PHP implementation for the sake of following the article better).
- Retwis-RB is a port of Retwis to Ruby and Sinatra written by Daniel Lucraft! Full source code is included of course, and a link to its Git repository appears in the footer of this article. The rest of this article targets PHP, but Ruby programmers can also check the Retwis-RB source code since it’s conceptually very similar.
- Retwis-J is a port of Retwis to Java, using the Spring Data Framework, written by Costin Leau. Its source code can be found on GitHub, and there is comprehensive documentation available at springsource.org.
What is a key-value store?
The essence of a key-value store is the ability to store some data, called a value, inside a key. The value can be retrieved later only if we know the specific key it was stored in. There is no direct way to search for a key by value. In some sense, it is like a very large hash/dictionary, but it is persistent, i.e. when your application ends, the data doesn’t go away. So, for example, I can use the command SET
to store the value bar in the key foo:
SET foo bar
Redis stores data permanently, so if I later ask “What is the value stored in key foo?” Redis will reply with bar:
GET foo => bar
Other common operations provided by key-value stores are DEL
, to delete a given key and its associated value, SET-if-not-exists (called SETNX
on Redis), to assign a value to a key only if the key does not already exist, and INCR
, to atomically increment a number stored in a given key:
SET foo 10
INCR foo => 11
INCR foo => 12
INCR foo => 13
Atomic operations
There is something special about INCR
. You may wonder why Redis provides such an operation if we can do it ourselves with a bit of code? After all, it is as simple as:
x = GET foo
x = x + 1
SET foo x
The problem is that incrementing this way will work as long as there is only one client working with the key foo at one time. See what happens if two clients are accessing this key at the same time:
x = GET foo (yields 10)
y = GET foo (yields 10)
x = x + 1 (x is now 11)
y = y + 1 (y is now 11)
SET foo x (foo is now 11)
SET foo y (foo is now 11)
Something is wrong! We incremented the value two times, but instead of going from 10 to 12, our key holds 11. This is because the increment done with GET / increment / SET
is not an atomic operation. Instead the INCR provided by Redis, Memcached, …, are atomic implementations, and the server will take care of protecting the key during the time needed to complete the increment in order to prevent simultaneous accesses.
What makes Redis different from other key-value stores is that it provides other operations similar to INCR that can be used to model complex problems. This is why you can use Redis to write whole web applications without using another database like an SQL database, and without going crazy.
Beyond key-value stores: lists
In this section we will see which Redis features we need to build our Twitter clone. The first thing to know is that Redis values can be more than strings. Redis supports Lists, Sets, Hashes, Sorted Sets, Bitmaps, and HyperLogLog types as values, and there are atomic operations to operate on them so we are safe even with multiple accesses to the same key. Let’s start with Lists:
LPUSH mylist a (now mylist holds 'a')
LPUSH mylist b (now mylist holds 'b','a')
LPUSH mylist c (now mylist holds 'c','b','a')
LPUSH
means Left Push, that is, add an element to the left (or to the head) of the list stored in mylist. If the key mylist does not exist it is automatically created as an empty list before the PUSH operation. As you can imagine, there is also an RPUSH
operation that adds the element to the right of the list (on the tail). This is very useful for our Twitter clone. User updates can be added to a list stored in username:updates
, for instance.
There are operations to get data from Lists, of course. For instance, LRANGE returns a range from the list, or the whole list.
LRANGE mylist 0 1 => c,b
LRANGE uses zero-based indexes - that is the first element is 0, the second 1, and so on. The command arguments are LRANGE key first-index last-index
. The last-index argument can be negative, with a special meaning: -1 is the last element of the list, -2 the penultimate, and so on. So, to get the whole list use:
LRANGE mylist 0 -1 => c,b,a
Other important operations are LLEN that returns the number of elements in the list, and LTRIM that is like LRANGE but instead of returning the specified range trims the list, so it is like Get range from mylist, Set this range as new value but does so atomically.
The Set data type
Currently we don’t use the Set type in this tutorial, but since we use Sorted Sets, which are kind of a more capable version of Sets, it is better to start introducing Sets first (which are a very useful data structure per se), and later Sorted Sets.
There are more data types than just Lists. Redis also supports Sets, which are unsorted collections of elements. It is possible to add, remove, and test for existence of members, and perform the intersection between different Sets. Of course it is possible to get the elements of a Set. Some examples will make it more clear. Keep in mind that SADD
is the add to set operation, SREM
is the remove from set operation, SISMEMBER
is the test if member operation, and SINTER
is the perform intersection operation. Other operations are SCARD
to get the cardinality (the number of elements) of a Set, and SMEMBERS
to return all the members of a Set.
SADD myset a
SADD myset b
SADD myset foo
SADD myset bar
SCARD myset => 4
SMEMBERS myset => bar,a,foo,b
Note that SMEMBERS
does not return the elements in the same order we added them since Sets are unsorted collections of elements. When you want to store in order it is better to use Lists instead. Some more operations against Sets:
SADD mynewset b
SADD mynewset foo
SADD mynewset hello
SINTER myset mynewset => foo,b
SINTER
can return the intersection between Sets but it is not limited to two Sets. You may ask for the intersection of 4,5, or 10000 Sets. Finally let’s check how SISMEMBER
works:
SISMEMBER myset foo => 1
SISMEMBER myset notamember => 0
The Sorted Set data type
Sorted Sets are similar to Sets: collection of elements. However in Sorted Sets each element is associated with a floating point value, called the element score. Because of the score, elements inside a Sorted Set are ordered, since we can always compare two elements by score (and if the score happens to be the same, we compare the two elements as strings).
Like Sets in Sorted Sets it is not possible to add repeated elements, every element is unique. However it is possible to update an element’s score.
Sorted Set commands are prefixed with Z
. The following is an example
of Sorted Sets usage:
ZADD zset 10 a
ZADD zset 5 b
ZADD zset 12.55 c
ZRANGE zset 0 -1 => b,a,c
In the above example we added a few elements with ZADD
, and later retrieved
the elements with ZRANGE
. As you can see the elements are returned in order
according to their score. In order to check if a given element exists, and
also to retrieve its score if it exists, we use the ZSCORE
command:
ZSCORE zset a => 10
ZSCORE zset non_existing_element => NULL
Sorted Sets are a very powerful data structure, you can query elements by score range, lexicographically, in reverse order, and so forth. To know more please check the Sorted Set sections in the official Redis commands documentation.
The Hash data type
This is the last data structure we use in our program, and is extremely easy to gasp since there is an equivalent in almost every programming language out there: Hashes. Redis Hashes are basically like Ruby or Python hashes, a collection of fields associated with values:
HMSET myuser name Salvatore surname Sanfilippo country Italy
HGET myuser surname => Sanfilippo
HMSET
can be used to set fields in the hash, that can be retrieved with
HGET
later. It is possible to check if a field exists with HEXISTS
, or
to increment a hash field with HINCRBY
and so forth.
Hashes are the ideal data structure to represent objects. For example we use Hashes in order to represent Users and Updates in our Twitter clone.
Okay, we just exposed the basics of the Redis main data structures, we are ready to start coding!
Prerequisites
If you haven’t downloaded the Retwis source code already please grab it now. It contains a few PHP files, and also a copy of Predis, the PHP client library we use in this example.
Another thing you probably want is a working Redis server. Just get the source, build with make
, run with ./redis-server
, and you’re ready to go. No configuration is required at all in order to play with or run Retwis on your computer.
Data layout
When working with a relational database, a database schema must be designed so that we’d know the tables, indexes, and so on that the database will contain. We don’t have tables in Redis, so what do we need to design? We need to identify what keys are needed to represent our objects and what kind of values these keys need to hold.
Let’s start with Users. We need to represent users, of course, with their username, userid, password, the set of users following a given user, the set of users a given user follows, and so on. The first question is, how should we identify a user? Like in a relational DB, a good solution is to identify different users with different numbers, so we can associate a unique ID with every user. Every other reference to this user will be done by id. Creating unique IDs is very simple to do by using our atomic INCR
operation. When we create a new user we can do something like this, assuming the user is called “antirez”:
INCR next_user_id => 1000
HMSET user:1000 username antirez password p1pp0
Note: you should use a hashed password in a real application, for simplicity we store the password in clear text.
We use the next_user_id
key in order to always get a unique ID for every new user. Then we use this unique ID to name the key holding a Hash with user’s data. This is a common design pattern with key-values stores! Keep it in mind.
Besides the fields already defined, we need some more stuff in order to fully define a User. For example, sometimes it can be useful to be able to get the user ID from the username, so every time we add a user, we also populate the users
key, which is a Hash, with the username as field, and its ID as value.
HSET users antirez 1000
This may appear strange at first, but remember that we are only able to access data in a direct way, without secondary indexes. It’s not possible to tell Redis to return the key that holds a specific value. This is also our strength. This new paradigm is forcing us to organize data so that everything is accessible by primary key, speaking in relational DB terms.
Followers, following, and updates
There is another central need in our system. A user might have users who follow them, which we’ll call their followers. A user might follow other users, which we’ll call a following. We have a perfect data structure for this. That is… Sets. The uniqueness of Sets elements, and the fact we can test in constant time for existence, are two interesting features. However what about also remembering the time at which a given user started following another one? In an enhanced version of our simple Twitter clone this may be useful, so instead of using a simple Set, we use a Sorted Set, using the user ID of the following or follower user as element, and the unix time at which the relation between the users was created, as our score.
So let’s define our keys:
followers:1000 => Sorted Set of uids of all the followers users
following:1000 => Sorted Set of uids of all the following users
We can add new followers with:
ZADD followers:1000 1401267618 1234 => Add user 1234 with time 1401267618
Another important thing we need is a place were we can add the updates to display in the user’s home page. We’ll need to access this data in chronological order later, from the most recent update to the oldest, so the perfect kind of data structure for this is a List. Basically every new update will be LPUSH
ed in the user updates key, and thanks to LRANGE
, we can implement pagination and so on. Note that we use the words updates and posts interchangeably, since updates are actually “little posts” in some way.
posts:1000 => a List of post ids - every new post is LPUSHed here.
This list is basically the User timeline. We’ll push the IDs of her/his own posts, and, the IDs of all the posts of created by the following users. Basically, we’ll implement a write fanout.
Authentication
OK, we have more or less everything about the user except for authentication. We’ll handle authentication in a simple but robust way: we don’t want to use PHP sessions, as our system must be ready to be distributed among different web servers easily, so we’ll keep the whole state in our Redis database. All we need is a random unguessable string to set as the cookie of an authenticated user, and a key that will contain the user ID of the client holding the string.
We need two things in order to make this thing work in a robust way.
First: the current authentication secret (the random unguessable string)
should be part of the User object, so when the user is created we also set
an auth
field in its Hash:
HSET user:1000 auth fea5e81ac8ca77622bed1c2132a021f9
Moreover, we need a way to map authentication secrets to user IDs, so
we also take an auths
key, which has as value a Hash type mapping
authentication secrets to user IDs.
HSET auths fea5e81ac8ca77622bed1c2132a021f9 1000
In order to authenticate a user we’ll do these simple steps (see the login.php
file in the Retwis source code):
- Get the username and password via the login form.
- Check if the
username
field actually exists in theusers
Hash. - If it exists we have the user id, (i.e. 1000).
- Check if user:1000 password matches, if not, return an error message.
- Ok authenticated! Set “fea5e81ac8ca77622bed1c2132a021f9” (the value of user:1000
auth
field) as the “auth” cookie.
This is the actual code:
include("retwis.php");
# Form sanity checks
if (!gt("username") || !gt("password"))
goback("You need to enter both username and password to login.");
# The form is ok, check if the username is available
$username = gt("username");
$password = gt("password");
$r = redisLink();
$userid = $r->hget("users",$username);
if (!$userid)
goback("Wrong username or password");
$realpassword = $r->hget("user:$userid","password");
if ($realpassword != $password)
goback("Wrong username or password");
# Username / password OK, set the cookie and redirect to index.php
$authsecret = $r->hget("user:$userid","auth");
setcookie("auth",$authsecret,time()+3600*24*365);
header("Location: index.php");
This happens every time a user logs in, but we also need a function isLoggedIn
in order to check if a given user is already authenticated or not. These are the logical steps preformed by the isLoggedIn
function:
- Get the “auth” cookie from the user. If there is no cookie, the user is not logged in, of course. Let’s call the value of the cookie
<authcookie>
. - Check if
<authcookie>
field in theauths
Hash exists, and what the value (the user ID) is (1000 in the example). - In order for the system to be more robust, also verify that user:1000 auth field also matches.
- OK the user is authenticated, and we loaded a bit of information in the
$User
global variable.
The code is simpler than the description, possibly:
function isLoggedIn() {
global $User, $_COOKIE;
if (isset($User)) return true;
if (isset($_COOKIE['auth'])) {
$r = redisLink();
$authcookie = $_COOKIE['auth'];
if ($userid = $r->hget("auths",$authcookie)) {
if ($r->hget("user:$userid","auth") != $authcookie) return false;
loadUserInfo($userid);
return true;
}
}
return false;
}
function loadUserInfo($userid) {
global $User;
$r = redisLink();
$User['id'] = $userid;
$User['username'] = $r->hget("user:$userid","username");
return true;
}
Having loadUserInfo
as a separate function is overkill for our application, but it’s a good approach in a complex application. The only thing that’s missing from all the authentication is the logout. What do we do on logout? That’s simple, we’ll just change the random string in user:1000 auth
field, remove the old authentication secret from the auths
Hash, and add the new one.
Important: the logout procedure explains why we don’t just authenticate the user after looking up the authentication secret in the auths
Hash, but double check it against user:1000 auth
field. The true authentication string is the latter, while the auths
Hash is just an authentication field that may even be volatile, or, if there are bugs in the program or a script gets interrupted, we may even end with multiple entries in the auths
key pointing to the same user ID. The logout code is the following (logout.php
):
include("retwis.php");
if (!isLoggedIn()) {
header("Location: index.php");
exit;
}
$r = redisLink();
$newauthsecret = getrand();
$userid = $User['id'];
$oldauthsecret = $r->hget("user:$userid","auth");
$r->hset("user:$userid","auth",$newauthsecret);
$r->hset("auths",$newauthsecret,$userid);
$r->hdel("auths",$oldauthsecret);
header("Location: index.php");
That is just what we described and should be simple to understand.
Updates
Updates, also known as posts, are even simpler. In order to create a new post in the database we do something like this:
INCR next_post_id => 10343
HMSET post:10343 user_id $owner_id time $time body "I'm having fun with Retwis"
As you can see each post is just represented by a Hash with three fields. The ID of the user owning the post, the time at which the post was published, and finally, the body of the post, which is, the actual status message.
After we create a post and we obtain the post ID, we need to LPUSH the ID in the timeline of every user that is following the author of the post, and of course in the list of posts of the author itself (everybody is virtually following herself/himself). This is the file post.php
that shows how this is performed:
include("retwis.php");
if (!isLoggedIn() || !gt("status")) {
header("Location:index.php");
exit;
}
$r = redisLink();
$postid = $r->incr("next_post_id");
$status = str_replace("\n"," ",gt("status"));
$r->hmset("post:$postid","user_id",$User['id'],"time",time(),"body",$status);
$followers = $r->zrange("followers:".$User['id'],0,-1);
$followers[] = $User['id']; /* Add the post to our own posts too */
foreach($followers as $fid) {
$r->lpush("posts:$fid",$postid);
}
# Push the post on the timeline, and trim the timeline to the
# newest 1000 elements.
$r->lpush("timeline",$postid);
$r->ltrim("timeline",0,1000);
header("Location: index.php");
The core of the function is the foreach
loop. We use ZRANGE
to get all the followers of the current user, then the loop will LPUSH
the push the post in every follower timeline List.
Note that we also maintain a global timeline for all the posts, so that in the Retwis home page we can show everybody’s updates easily. This requires just doing an LPUSH
to the timeline
List. Let’s face it, aren’t you starting to think it was a bit strange to have to sort things added in chronological order using ORDER BY
with SQL? I think so.
There is an interesting thing to notice in the code above: we used a new
command called LTRIM
after we perform the LPUSH
operation in the global
timeline. This is used in order to trim the list to just 1000 elements. The
global timeline is actually only used in order to show a few posts in the
home page, there is no need to have the full history of all the posts.
Basically LTRIM
+ LPUSH
is a way to create a capped collection in Redis.
Paginating updates
Now it should be pretty clear how we can use LRANGE
in order to get ranges of posts, and render these posts on the screen. The code is simple:
function showPost($id) {
$r = redisLink();
$post = $r->hgetall("post:$id");
if (empty($post)) return false;
$userid = $post['user_id'];
$username = $r->hget("user:$userid","username");
$elapsed = strElapsed($post['time']);
$userlink = "<a class=\"username\" href=\"profile.php?u=".urlencode($username)."\">".utf8entities($username)."</a>";
echo('<div class="post">'.$userlink.' '.utf8entities($post['body'])."<br>");
echo('<i>posted '.$elapsed.' ago via web</i></div>');
return true;
}
function showUserPosts($userid,$start,$count) {
$r = redisLink();
$key = ($userid == -1) ? "timeline" : "posts:$userid";
$posts = $r->lrange($key,$start,$start+$count);
$c = 0;
foreach($posts as $p) {
if (showPost($p)) $c++;
if ($c == $count) break;
}
return count($posts) == $count+1;
}
showPost
will simply convert and print a Post in HTML while showUserPosts
gets a range of posts and then passes them to showPosts
.
Note: LRANGE
is not very efficient if the list of posts start to be very
big, and we want to access elements which are in the middle of the list, since Redis Lists are backed by linked lists. If a system is designed for
deep pagination of million of items, it is better to resort to Sorted Sets
instead.
Following users
It is not hard, but we did not yet check how we create following / follower relationships. If user ID 1000 (antirez) wants to follow user ID 5000 (pippo), we need to create both a following and a follower relationship. We just need to ZADD
calls:
ZADD following:1000 5000
ZADD followers:5000 1000
Note the same pattern again and again. In theory with a relational database, the list of following and followers would be contained in a single table with fields like following_id
and follower_id
. You can extract the followers or following of every user using an SQL query. With a key-value DB things are a bit different since we need to set both the 1000 is following 5000
and 5000 is followed by 1000
relations. This is the price to pay, but on the other hand accessing the data is simpler and extremely fast. Having these things as separate sets allows us to do interesting stuff. For example, using ZINTERSTORE
we can have the intersection of following
of two different users, so we may add a feature to our Twitter clone so that it is able to tell you very quickly when you visit somebody else’s profile, “you and Alice have 34 followers in common”, and things like that.
You can find the code that sets or removes a following / follower relation in the follow.php
file.
Making it horizontally scalable
Gentle reader, if you read till this point you are already a hero. Thank you. Before talking about scaling horizontally it is worth checking performance on a single server. Retwis is extremely fast, without any kind of cache. On a very slow and loaded server, an Apache benchmark with 100 parallel clients issuing 100000 requests measured the average pageview to take 5 milliseconds. This means you can serve millions of users every day with just a single Linux box, and this one was monkey ass slow… Imagine the results with more recent hardware.
However you can’t go with a single server forever, how do you scale a key-value store?
Retwis does not perform any multi-keys operation, so making it scalable is simple: you may use client-side sharding, or something like a sharding proxy like Twemproxy, or the upcoming Redis Cluster.
To know more about those topics please read our documentation about sharding. However, the point here to stress is that in a key-value store, if you design with care, the data set is split among many independent small keys. To distribute those keys to multiple nodes is more straightforward and predictable compared to using a semantically more complex database system.
15 -
Command key specifications
Many of the commands in Redis accept key names as input arguments.
The 8th element in the reply of COMMAND
(and COMMAND INFO
) is an array that consists of the command’s key specifications.
A key specification describes a rule for extracting the names of one or more keys from the arguments of a given command. Key specifications provide a robust and flexible mechanism, compared to the first key, last key and step scheme employed until Redis 7.0. Before introducing these specifications, Redis clients had no trivial programmatic means to extract key names for all commands.
Cluster-aware Redis clients had to have the keys' extraction logic hard-coded in the cases of commands such as EVAL
and ZUNIONSTORE
that rely on a numkeys argument or SORT
and its many clauses.
Alternatively, the COMMAND GETKEYS
can be used to achieve a similar extraction effect but at a higher latency.
A Redis client isn’t obligated to support key specifications. It can continue using the legacy first key, last key and step scheme along with the movablekeys flag that remain unchanged.
However, a Redis client that implements key specifications support can consolidate most of its keys' extraction logic.
Even if the client encounters an unfamiliar type of key specification, it can always revert to the COMMAND GETKEYS
command.
That said, most cluster-aware clients only require a single key name to perform correct command routing, so it is possible that although a command features one unfamiliar specification, its other specification may still be usable by the client.
Key specifications are maps with three keys:
- begin_search:: the starting index for keys' extraction.
- find_keys: the rule for identifying the keys relative to the BS.
- notes: notes about this key spec, if there are any.
- flags: indicate the type of data access.
begin_search
The begin_search value of a specification informs the client of the extraction’s beginning.
The value is a map.
There are three types of begin_search
:
- index: key name arguments begin at a constant index.
- keyword: key names start after a specific keyword (token).
- unknown: an unknown type of specification - see the incomplete flag section for more details.
index
The index type of begin_search
indicates that input keys appear at a constant index.
It is a map under the spec key with a single key:
- index: the 0-based index from which the client should start extracting key names.
keyword
The keyword type of begin_search
means a literal token precedes key name arguments.
It is a map under the spec with two keys:
- keyword: the keyword (token) that marks the beginning of key name arguments.
- startfrom: an index to the arguments array from which the client should begin searching. This can be a negative value, which means the search should start from the end of the arguments' array, in reverse order. For example, -2’s meaning is to search reverse from the penultimate argument.
More examples of the keyword search type include:
SET
has abegin_search
specification of type index with a value of 1.XREAD
has abegin_search
specification of type keyword with the values “STREAMS” and 1 as keyword and startfrom, respectively.MIGRATE
has a start_search specification of type keyword with the values of “KEYS” and -2.
find_keys
The find_keys
value of a key specification tells the client how to continue the search for key names.
find_keys
has three possible types:
- range: keys stop at a specific index or relative to the last argument.
- keynum: an additional argument specifies the number of input keys.
- unknown: an unknown type of specification - see the incomplete flag section for more details.
range
The range type of find_keys
is a map under the spec key with three keys:
- lastkey: the index, relative to
begin_search
, of the last key argument. This can be a negative value, in which case it isn’t relative. For example, -1 indicates to keep extracting keys until the last argument, -2 until one before the last, and so on. - keystep: the number of arguments that should be skipped, after finding a key, to find the next one.
- limit: if lastkey is has the value of -1, we use the limit to stop the search by a factor. 0 and 1 mean no limit. 2 means half of the remaining arguments, 3 means a third, and so on.
keynum
The keynum type of find_keys
is a map under the spec key with three keys:
- keynumidx: the index, relative to
begin_search
, of the argument containing the number of keys. - firstkey: the index, relative to
begin_search
, of the first key. This is usually the next argument after keynumidx, and its value, in this case, is greater by one. - keystep: Tthe number of arguments that should be skipped, after finding a key, to find the next one.
Examples:
- The
SET
command has a range of 0, 1 and 0. - The
MSET
command has a range of -1, 2 and 0. - The
XREAD
command has a range of -1, 1 and 2. - The
ZUNION
command has a start_search type index with the value 1, andfind_keys
of type keynum with values of 0, 1 and 1. - The
AI.DAGRUN
command has a start_search of type keyword with values of “LOAD” and 1, andfind_keys
of type keynum with values of 0, 1 and 1.
Note: this isn’t a perfect solution as the module writers can come up with anything. However, this mechanism should allow the extraction of key name arguments for the vast majority of commands.
notes
Notes about non-obvious key specs considerations, if applicable.
flags
A key specification can have additional flags that provide more details about the key. These flags are divided into three groups, as described below.
Access type flags
The following flags declare the type of access the command uses to a key’s value or its metadata. A key’s metadata includes LRU/LFU counters, type, and cardinality. These flags do not relate to the reply sent back to the client.
Every key specification has precisely one of the following flags:
- RW: the read-write flag. The command modifies the data stored in the value of the key or its metadata. This flag marks every operation that isn’t distinctly a delete, an overwrite, or read-only.
- RO: the read-only flag. The command only reads the value of the key (although it doesn’t necessarily return it).
- OW: the overwrite flag. The command overwrites the data stored in the value of the key.
- RM: the remove flag. The command deletes the key.
Logical operation flags
The following flags declare the type of operations performed on the data stored as the key’s value and its TTL (if any), not the metadata. These flags describe the logical operation that the command executes on data, driven by the input arguments. The flags do not relate to modifying or returning metadata (such as a key’s type, cardinality, or existence).
Every key specification may include the following flag:
- access: the access flag. This flag indicates that the command returns, copies, or somehow uses the user’s data that’s stored in the key.
In addition, the specification may include precisely one of the following:
- update: the update flag. The command updates the data stored in the key’s value. The new value may depend on the old value. This flag marks every operation that isn’t distinctly an insert or a delete.
- insert: the insert flag. The command only adds data to the value; existing data isn’t modified or deleted.
- delete: the delete flag. The command explicitly deletes data from the value stored at the key.
Miscellaneous flags
Key specifications may have the following flags:
- not_key: this flag indicates that the specified argument isn’t a key. This argument is treated the same as a key when computing which slot a command should be assigned to for Redis cluster. For all other purposes this argument should not be considered a key.
- incomplete: this flag is explained below.
- variable_flags: this flag is explained below.
incomplete
Some commands feature exotic approaches when it comes to specifying their keys, which makes extraction difficult.
Consider, for example, what would happen with a call to MIGRATE
that includes the literal string “KEYS” as an argument to its AUTH clause.
Our key specifications would miss the mark, and extraction would begin at the wrong index.
Thus, we recognize that key specifications are incomplete and may fail to extract all keys. However, we assure that even incomplete specifications never yield the wrong names of keys, providing that the command is syntactically correct.
In the case of MIGRATE
, the search begins at the end (startfrom has the value of -1).
If and when we encounter a key named “KEYS”, we’ll only extract the subset of the key name arguments after it.
That’s why MIGRATE
has the incomplete flag in its key specification.
Another case of incompleteness is the SORT
command.
Here, the begin_search
and find_keys
are of type unknown.
The client should revert to calling the COMMAND GETKEYS
command to extract key names from the arguments, short of implementing it natively.
The difficulty arises, for example, because the string “STORE” is both a keyword (token) and a valid literal argument for SORT
.
Note:
the only commands with incomplete key specifications are SORT
and MIGRATE
.
We don’t expect the addition of such commands in the future.
variable_flags
In some commands, the flags for the same key name argument can depend on other arguments.
For example, consider the SET
command and its optional GET argument.
Without the GET argument, SET
is write-only, but it becomes a read and write command with it.
When this flag is present, it means that the key specification flags cover all possible options, but the effective flags depend on other arguments.
Examples
SET
’s key specifications
1) 1) "flags"
2) 1) RW
2) access
3) update
3) "begin-search"
4) 1) "type"
2) "index"
3) "spec"
4) 1) "index"
2) (integer) 1
5) "find-keys"
6) 1) "type"
2) "range"
3) "spec"
4) 1) "lastkey"
2) (integer) 0
3) "keystep"
4) (integer) 1
5) "limit"
6) (integer) 0
ZUNION
’s key specifications
1) 1) "flags"
2) 1) RO
2) access
3) "begin-search"
4) 1) "type"
2) "index"
3) "spec"
4) 1) "index"
2) (integer) 1
5) "find-keys"
6) 1) "type"
2) "keynum"
3) "spec"
4) 1) "keynumidx"
2) (integer) 0
3) "firstkey"
4) (integer) 1
5) "keystep"
6) (integer) 1