The Trouble with PyLink

11th Jul 2020 irc

The Trouble with PyLink

Five years ago, I started PyLink as a challenge: create a multi-network IRC services engine that would be the basis for transparent relaying between networks. The goal of this project was simple: allow networks to loosely federate and share channels while still maintaining their autonomy (opers/services/distinct branding). At the time, this idea really wasn't revolutionary; another project called Janus did pretty much same thing over a decade ago, but suffered from a lot of code rot over the years. I wanted to see if I could make something better.

Implementation wise, PyLink's Relay feature performs "puppeting" over a services link: on each participating network, it creates virtual clients for remote users to send messages and events from. And the initial implementation worked fairly well. While designing the project, I wrote up a pretty lengthy protocol spec for IRCd integrations, and this provided a functional baseline for Relay. In particular, I wanted the IRC integration to be as native as possible, so that moderation tools, modes, and cloaking generally work fine over Relay.

Here be dragons...

In 2016, I implemented a Clientbot transport. Unlike PyLink's other protocol modules which link as a server, Clientbot uses an IRC bot and forwards users and state info back to all the networks fully connected to Relay (i.e. via the services link). This feature had been part of Janus for quite some time, and many requested the same in PyLink Relay. So I went around and did it, and the end result worked okay. (Clientbot even supports IRCv3 features, woohoo!)

..But, Clientbot quickly became one of many implementation headaches that I wish I thought through more when I first designed it.

Recall that the protocol spec represents what a services server is able to do. This includes things like freely creating virtual users, and keeping track in detail which users are connected and in which channels.* Now, we have the task of providing the same interface given the very limited capabilities of an IRC bot. In short, I did a lot of stubbing to make the Clientbot interface appear reasonable. Effectively the whole process became: create virtual users in the state only, and delegate any attempts to send messages from them to a custom set of handlers. This became the relay_clientbot plugin, which catches these events and posts them as text into a target channel, much like any other relay bot.

But Clientbot also broke another key expectation:

From the beginning, PyLink tracked users by unique UIDs. Every robust IRC server protocol these days uses them, including UnrealIRCd, InspIRCd, and anything else based on TS6 or P10. (The premise, I'm assuming, is that state tracking is a nightmare if your user keys change all the time.) Unfortunately, IRC C2S does not give you this luxury. Instead, I had to create virtual UIDs to represent every user Clientbot sees, the first time it sees them. And mind you, IRC C2S doesn't explicitly tell you at all when users connect either; you have to just enumerate users as you see them.

Also, we have to balance these virtual UIDs with all the virtual clients that Relay et al. created earlier. Unsurprisingly, I screwed up a lot while working on this.

And, we need to remember to filter out pseudo UIDs from all outgoing messages. There are many places where these can leak, like in mode arguments or kick targets. (As you can imagine, I screwed this up too on multiple occasions.)

* For those who don't know, keeping track of state is a requirement for all IRC services (unless you don't have any code that needs it). Server protocols are fundamentally different from the client protocol, and client commands like /names and /who simply do not work here. Servers send each other all the state info that they know about on connect (channels, users, etc.), and expect the other end to keep track of everything themselves.

Then the dragons start a nest...

In 2019, we had the ambitious idea of bridging Discord to IRC using the same protocol specification. Earlier in v2.0, I had already refactored PyLink's core to separate the code most specific to IRC (e.g. connection handling) from the rest of the state tracking tools. So, I thought this would work fine (oh boy, was I wrong).

To summarize, pylink-discord broke even MORE assumptions. I'll write them here in list form to keep things readable:

Channels are now stored by integer IDs instead of strings. TypeErrors galore!
User IDs may now be integers too. More TypeErrors!
Nicks are no longer guaranteed to be unique (wait, really?)
Nicks can now include whitespace and Unicode text (this is fine, except it makes command parsing horrible)
State change commands like JOIN and PART now make absolutely no sense.
- Channel presence on Discord is determined solely by whether a user has permissions to read a channel; there aren't any explicit join and leave events.
- But, we have to pretend there are anyways, because that's what the protocol spec says.
Channel user lists can't be gathered in the same way again. Now have to do a paginated API call to fetch every user in the guild
- ...And then check whether they have permission to read each text channel that the bot is in
- As you can imagine, this doesn't scale well for larger guilds (and this is a known issue)
Discord also doesn't allow bots to create virtual users in the same way IRC servers do. So.. inherit from clientbot I guess?
- But now we've also moved the rewriting of outgoing messages from relay_clientbot to the protocol module itself, to support username spoofing via webhooks..

Plus (and this is really the kicker), we had to monkey-patch all of Python's networking and threading stack for our Discord library to run. And this dependency totally doesn't break in confusing ways whenever new Python versions release.

This is fine...

When Python 3.8 released, pylink-discord broke down completely. For reference, this is the default version shipped with Ubuntu 20.04, so anyone who upgraded early now had a services server that refused to start with no error message whatsoever. And at this point, I was really doubting the feasibility of pylink-discord.

More broadly, I've come to realize that PyLink is simply the wrong tool for bridging together multiple chat platforms. PyLink's APIs were designed to develop IRC services, and had to be tightly coupled to IRC details as a result. Hacking pylink-discord in the way I did was a mistake, and I should've realized it much earlier.

So now what?

In some ways, this post is a call for help. I don't have nearly as much time these days to work on PyLink, but the project still has way more user interest than developer interest. Just look at this commit breakdown from the last five years:

$ git shortlog -sne
  3600  James Lu <[email protected]>
    29  Daniel Oaks <mail@hidden>
    25  Ken Spencer <mail@hidden>
    15  Mitchell Cooper <mail@hidden>
    10  Celelibi <mail@hidden>
     3  Ian Carpenter <mail@hidden>
     3  Ken Spencer <mail@hidden>
     2  Austin Ellis <mail@hidden>
     2  Celelibi <mail@hidden>
     2  Jordy Zomer <mail@hidden>

For something like pylink-discord to be sustainable, a lot of work needs to happen to modernize PyLink and rid it of tech debt.

Refactor PyLink's core to decouple state, connection handling, and IRC-specific utilities (e.g. mode parsers)
Add test cases for protocols and core functionality - currently PyLink has very sparse test coverage which covers only a few components
Design a new / separate API that abstracts away IRC protocol details, while still allowing platforms to share messages and state information
(For the future) modernize and port PyLink from select & threads to asyncio. This will make interacting with mainstream chat platform libraries a lot easier.

What's the alternative?

~~PyLink changes maintainer 5 times in a row and ends up unmaintained like Janus.~~

Recently, I've been working on a stateless alternative to all this bridging fuss. It's called RELAYMSG, and it's about as feature complete as say, Discord webhooks. Essentially, messages are sent from virtual users, but those users are not represented in the state at all. Instead, the local IRCd translates commands sent via RELAYMSG into PRIVMSG from a virtual users. (Yes, this method has its ups and downs, but it conveniently makes the code 50 times simpler.)

You can find the IRCv3 proposal for RELAYMSG here: https://github.com/ircv3/ircv3-specifications/pull/417 †

So far the following vendors have implemented it:

Oragono
InspIRCd via custom module: https://github.com/overdrivenetworks/inspircd-contrib/blob/relaymsg/3.0/m_relaymsg.cpp
matterbridge fork: https://github.com/overdrivenetworks/matterbridge/

† Yes, I know this proposal is controversial. I also feel that there's a non-zero amount of people who think bridging solutions for IRC are an abomination. But I don't think relays are a bad idea in general - after all, it's just a form of interoperability. As long as IRC is an open platform, I will happily hack on it and add features that work for my community. And I hope others feel encouraged to do the same.

Previous Post Next Post