This is the first post in this blog, it was published on February 19, 2021.
The year of 2020 will be remembered for the pandemic, the BLM movement, and the U.S. elections among other billion of things around the globe. It is funny I even mention the third one considering how little I give a damn about U.S. politics, yet this whole story begins with Parler deplatforming that happened about a month ago. Let me remind you: Apple, Google, Amazon, and a few other companies terminated their service to the free speech social network for insufficient moderation effectively destroying the platform in a matter of just a couple of days. What the fuck?
OK, let me be clear with my position: I believe every private company has a right to refuse service to anyone, whether an individual or a business, but I also have my own right to despise them for exercising that. What they did was probably legal, but screw them anyway, they failed us. Regardless what these psychopathic corporations like to tell the public, they are only concerned with maximizing shareholder value, and if there is anything even remotely resembling an image liability (through pressure by political radicals, cancel culture SJWs, you name it), they will not think twice. What disgusts me the most here is neither greed nor hypocrisy but their unwillingness to grow a pair of balls and stand up for freedom of speech.
You see, freedom of speech and expression must be absolute. You cannot have censorship-resistance with exceptions; otherwise, these exceptions could be used to remove or block anything unwanted, not only offensive. This way, the Chinese cannot access Wikipedia because of what originally started as a counter-terrorism measure, and the Russians cannot access LinkedIn because of what originally started as a children protection measure. We cannot deprive humanity of their freedom just because some small fraction of users might, unfortunately, use that freedom to spread offensive content. In the same way, you do not ban electricity because people get electrocuted.
Let us now switch from corporates to governments. Ooh, wee! Do not even get me started on that. And I am not even talking about cases like Google happily not letting people disable SafeSearch in Indonesia because its government knows best, that is just the tip of the iceberg. I am talking about political censorship which includes silencing people with torture, gulags, and bullets. Here is the world map of the freedom of the press status:
Just look at this mess. Blue tones mean OK-ish, others not so much. This map is NordNordWest’s work based on the 2020 Press Freedom Index and is distributed under CC BY-SA 3.0 de. Keep in mind population densities, e.g. there are about 90 times more people per unit area living in Vietnam than Australia. I am actually surprised the U.S. did so well in 2020 considering how badly they wanted Mr. Assange to be extradited and executed.
What would you answer your children if they asked you how in the world North Korea still exists in its current form with 25 million Koreans suffering for over 70 years and no one is doing anything about that? Or how about 28 million people in Venezuela? Or 82 million people in Iran? Giving voice to all whistleblowers and activists, especially the ones risking their lives and freedom in hostile environments is the fundamental goal of Pepe.
There is already Tor, I2P, Freenet, GNUnet etc. We can run emails, message boards, BitTorrent, Kad, and IPFS on top of them, maybe even use Ethereum smart contracts for decentralized computing. All the technology is there, why bother with something new? Well, first of all, these are all amazing projects, there is nothing wrong with them. The peculiar thing, however, is none of them except BitTorrent (and perhaps Tor) gained much popularity, neither do we see any readily available censorship-resistant communication platforms. Why is that?
I claim there are 2 main reasons for that:
They are hard to use. The “Unix is user-friendly, it is just picky about who its friends are.” aphorism still lives in most them: you may need to install a bunch of additional software (such as JVM or shared libraries) potentially dealing with a dependency hell on some platforms; read through sparse documentation and dead forums on optimal network, security, and sharing settings; carefully configure your router, computer, and client; install, study, and configure applications running on top of the darknet, i.e. repeat the steps. The reason why Tor became popular outside of research was not because it was first, but because of the hacky all-in-one Tor Browser Bundle with sane defaults.
They prefer purity to practicality. Instead of concentrating manpower on few specific use cases, most existing tools try to conquer the world: a new internet, interplanetary, infrastructure, an application framework, APIs, a Turing-complete language on the blockchain etc. This is great and all, it is general, conceptual, modular, extensible, and stackable—everything we like—but sometimes overengineering is just overengineering given the goal. And our goal here is not to make a technical revolution, but to help as many people as we can communicate without fear of retribution.
BitTorrent evolved into something that is used by 150 million people worldwide, it seamlessly adopted DHT, PEX, µTP, trackerless magnet links, and people do not even know what the hell it all means. Even though proprietary, Skype thrived very similarly (at least before it was crippled by Microsoft), millions of its users did not even know what peer-to-peer meant, not to mention how it worked under the hood, it just did. These two systems succeeded not because of luck but rather as a result of some excellent product decisions. We need to learn from that and reiterate.
For the messaging platform, I chose to use an imageboard similar to 4chan or Futaba Channel. While not the most popular type of forum, imageboards are extremely flexible and free of junk like authentication or karma, they promote anonymity in a very practical way, and over 30 million people are already familiar with them. Perhaps, I am not a big fan of their crowded old-school design, but the initial user traction is more important than my sense of beauty, we will refine the looks through time.
That is, the Pepe imageboard is going to be the only application running on top the Pepe darknet, they are in fact inseparable. This way, we can design the network specifically for this one use case. This brings both security and performance benefits. Joining the darknet can be as simple as double clicking the application, and users do not need to install or configure any third-party browsers or proxy servers, they can just go to localhost:8666 using Chrome, Safari, or whatever they like, and it is going to be safe without any third-party extensions.
Once online, users may browse existing or create new message boards about various topics in any language such as
/ja/math/. A board is a collection of threads about something more specific, would it be an idea or a question. A thread has a collection of posts that people send replying to each other. Each post may have one or multiple attachments such as photos, videos, you name it. So that you have an idea of what it looks like, here is a screenshot of a random thread on the 4chan DIY board:
What is fundamentally different with Pepe is moderation. Instead of relying on a centralized entity with a banhammer, each board and thread owner may anonymously moderate their spaces on their own. However, nothing can actually be deleted, it can only be shadowed, and each user decides whether they want to see the light or the full version of the page at any moment in time. People can still reply to shadowed posts inside their own shadowed posts, so no one cannot silence anyone, only maintain order on the light side.
If people are no longer interested in particular threads, they will eventually become forgotten by the network and naturally disappear from their board. But if there is at least one person who is subscribed to or has archived some thread, no one in the world (even Pepe creators) can censor or somehow shut it down without hurting most of the network Pepe is running on top of.
The three biggest problems with 4chan and similar communication platforms are:
Mitigating the first problem is the easiest: just use open-source software whenever possible. Regarding the centralization issue, we could switch to a decentralized solution like BitTorrent (imagine each torrent containing a thread with its posts and attachments), but that itself does not help with privacy, people can still see what others are doing. Similarly, we could tackle the privacy issue with a VPN or a darknet like Tor or I2P, but that, contrary to popular belief, does not solve the centralization issue in any way. Clearly, we need the best of the two worlds. Let us fuse them together!
Here is an very simplified walk through how the network works. Imagine Bob is an undercover journalist who wants to anonymously share his report and Alice is a political activist who is interested in the investigation Bob had been doing. It all starts with Bob announcing he has the report:
Bob joins the network and gathers information about random peers on it through the DHT. This way, Bob discovers hundreds of participants including X and Y. Similarly, Bob registers himself on the network through the DHT so that others can use him as their peer. At this moment, some people such as Bob’s ISP might actually learn Bob is on the network if he uses his home internet connection, but they can only guess his intentions. Unless no one knows Bob is the one behind those investigations, he should ideally be using a public Wi-Fi network or a restricted route to a trusted party instead.
Bob assembles a pool of ephemeral tunnels. A tunnel is a chain of peers relaying traffic for each other. Thus, Bob chooses X and Y to be a part of one such tunnel. Notice that X and Y are not related in any way, they are chosen randomly. He tells X to forward messages to him as long as the right token is provided and instructs X to ask Y to forward their messages to X if the same condition is met. At this point, Y has no idea who created the tunnel, nor does what the token represents. And because tunnels do not have a fixed length, X can only guess whether Bob is the one who initiated the tunnel construction, he might have just forwarded a message from someone else. This gives us plausible deniability.
Bob associates a key with his report. For that, he splits the report in pieces, builds a Merkle tree from them, and uses the root hash as the key. He then uses his X-Y tunnel to announce that key to the DHT. In order to do that, he uses garlic routing with multi-layer encryption so that the only thing X knows is Bob sent or forwarded something encrypted to be forwarded to Y, and Y only knows X sent or forwarded a request to assign Y as the rendezvous point as well as the token mentioned above with a certain key. At this point, no one including the peers forming the DHT knows what the key represents and, most importantly, who created it.
Let us now switch to Alice and assume for a moment she already knows the key by which she can find Bob’s report. Now, the following happens:
Alice joins the network, learns about peers such as U and V, registers herself in the DHT, and assembles a pool of ephemeral tunnels in exactly the same manner Bob did.
Alice uses her U-V tunnel to do a lookup for the key she knows. Just like with Bob, no one on the network knows what Alice is doing: U only knows Alice sent or forwarded something encrypted to be forwarded to V, and V only knows U sent or forwarded a request to look up metadata associated with a certain key. Once the DHT returns the metadata to N, N transfers it back to M encrypted with a password Alice provided so that M cannot read the message, M encrypts the message again with its password and forwards it to Alice. Since Alice know both of the passwords, she decrypts the message and learns Y is a rendezvous for Bob’s report and what token corresponds to it.
Alice uses her U-V tunnel to connect to Y in a similar manner. When Y receives the token, it forward Alice’s request to X, and X forwards it to Bob so he can send his report back to Alice through X, Y, V, and finally U, end-to-end encrypted with with a password she provided. Just as before, no one understands who is doing what on whose behalf thanks to garlic routing which also employs multiple delays randomly drawn from the exponential distribution by each party. Even more so, Alice has no idea where she is getting the report from, neither does Bob know who he is sending it to. Even if Bob was sending his report to malicious Eve, there would be no risk for him unless Eve controlled most of the network.
As soon as Alice starts downloading the report from Bob, she may announce one of her rendezvous points as well as its token to be associated with the same key the way Bob did, so that other people can start downloading the report from both Alice and Bob simultaneously. Maybe someone has already downloaded parts of the report from Bob in the meantime, so Alice can rely on the swarm of peers distributing it among each other even if Bob decided to leave the network. Alice can learn about the swarm through peer exchange or the DHT the same way she learned about Y being the rendezvous.
We can now drop the assumption Alice already knew the key of Bob’s report. This is how she would obtain it:
After Bob announces his report on the network, he creates a thread and summarizes his findings in its first post, he also links his report as an attachment through its key. A thread is basically a regular file that encodes a Merkle-CRDT data structure with posts and their attachment previews as events. He then generates a public and private key pair for that thread, digitally signs his post with the private key, and announces his thread on the DHT using the public key as the key of the thread in the exactly same manner he just announced his report.
Just like Alice would discover the swarm of peers for Bob’s report, Bob uses the DHT to discover and join the swarm of peers for the board he chooses to create his thread on. In case Bob does not know the key for the board, he can first obtain it in the same way from the metaboard whose key is hardcoded to zero. Once there, he publishes the key for his thread and its first post to the swarm using GossipSub. In order for his thread to be accepted by the swarm, Bob needs to first solve a Hashcash puzzle whose difficulty is determined by the message rate in the pub/sub to protect the board from spam.
As Bob’s message propagates through the swarm, each peer appends it to their copy of the board and starts a lightweight converging process to eventually reach a consistent state across the swarm. At the network level, boards are very similar to threads: they are keyed by the public key their owner issued for them, content moderation (shadow and unshadow) is done through digital signatures, and they are internally represented as an append-only log using Merkle-CRDTs.
When Alice joins the network, she downloads the relevant board from its swarm to discover Bob’s thread or gets notified of it through the pub/sub if she had already been there when Bob created it. Finally, she learns the key of Bob’s report and start downloading it. If she wants to thank Bob, she can reply to him in his thread, in order to do which she downloads the thread from its swarm and publishes a post in a very similar manner to how Bob published his thread on the board.
Here is my vision:
|0.2||DHT, UDP transport, NAT traversal.|
|0.4||Swarms, tunnels, garlic routing.|
|0.6||Merkle-CRDTs, GossipSub, Hashcash.|
|0.8||Boards, threads, posts, attachments.|
|1.0||Decent UI, UX, test coverage, docs.|
|1.2||Restricted routes, traffic obfuscation.|
|1.4||HTTPS and steganography transports.|
|1.6||Cross-board distributed search index.|
|1.8||Lightweight native Android client.|
|2.0||Bluetooth mesh, Wi-Fi Direct routing.|
This is pretty ambitious and will take quite some time to implement, test, and document. There are still a lot of unknowns such as best network parameters, peer scoring functions, and other heuristics. Something like these is typically approximated using machine learning, but we cannot do even that because of the distributed nature of the system (hence, lack of datasets to train on). We can reuse years worth of experimental data from the existing projects such as BitTorrent, Tor, and I2P, but a lot of tuning will still be required until converging to the optimal levels of performance, security, and stability.
For the development process, I am going to adopt the same enhancement proposal system as behind Python with its PEPs and BitTorrent with its BEPs. During the next few weeks, I am going to write the first several Pepe Enhancement Proposals and publish them in the peps repository. Whereas bug reports may go directly under the Issues tab in the pepe repository, I am expecting all non-trivial feature requests come with PEP pull requests. You can see those as mandatory design documents if you like. Once we reach version 0.8, you are welcome to join the development in
/en/pepe/ instead of Github.
Pepe is never going to have paid ads, any fees, premium memberships, or telemetry to sell. At the same time, I am going to be its lead developer (actually, the only developer for now) and the sole maintainer. As many of you, I have a 40-hour-a-week job and a family to support. I am planning to work on Pepe on weekends no matter what, but it is going to take a long time to get to a release in this way. With your support, I hope to switch to working on Pepe full time. Please consider donating using these methods:
Bitcoin (BTC) which is the most popular decentralized cryptocurrency with enormous market capitalization. If you are interested, you can learn how to use it at bitcoin.org and donate using this address:
Ethereum (ETH) which is one of the most technologically advanced blockchains whose cruptocurrency is the second-largest by market capitalization, after Bitcoin. You can learn how to use it at ethereum.org and donate using this address:
Monero (XMR) which is a privacy-focused cryptocurrency that uses an obfuscated public ledger meaning anyone can send or broadcast transactions, but no outside observer can tell the source, the amount, or the destination. You can learn more about it at getmonero.org and donate using this address:
I am going to publish a post with the current progress and how much money I could collect through donations every month or two. You do not have to pay to get access to them (they are going to be published in this blog) or to use Pepe once it is ready. Pepe is going to be available for free for everyone and released under the GNU AGPLv3+.
I am just a regular schnook.
I do not like fame. This specifically includes that part where people who like to get offended about stuff go crazy and burn down my house.
I chose it to be the same as the GPG key ID I use to sign everything including my commits:
There is a huge misunderstanding, Pepe the Frog has always been a symbol of resistance in underground culture, it was journalists who decided it was evil without understanding a damn thing.
It the only feasible way for us to significantly speed up the development without revealing your or my identity.
There is none and will never be. No ads, no fees, no premium shit, no telemetry. Free as in freedom.
Yes, you can. If it is something digital, I encourage you to price not one copy, but releasing it to the public domain.
It should not as I am not breaking any terms of service, acceptable use policies, or community guidelines. In an unlikely event an external force makes Github suspend or delete my account, we will continue development on
/en/pepe/ inside Pepe itself. Just in case, consider saving my public key so that you can later verify my commits and releases.
I am not a lawyer and this is not a legal advice, but yes, it is legal to both develop and use Pepe, at least the last time I checked (see when this post was last updated at the top of the page). The only known exception here is France where you could be restricted by DADVSI in which case consult with your lawyer.
Reasonably for practical use. I am hoping to eventually surpass the levels of security of I2P. You should read about its thread model because it mostly applies to Pepe as well. Using restricted tunnels should mitigate biggest vulnerabilities such as intersection and traffic analysis attacks for those who really care about it.
Reasonably for practical use. We should get the order of megabits per second of bandwidth. As for latency, do not expect miracles, it will be in the order of seconds.
I believe in a culture of openness and feel that exclusivity has very little value, especially in the presence of strong anonymity. You are still free to use public-key cryptography, but it will not likely be integrated to the imageboard.
Believe it or not, but Andoid-based smartphones are available for purchase and are in fact quite popular among North Korea’s citizens. There is even a 3G network that covers over 90% of the population, even without apparent gateways to the internet. Once Pepe works over Bluetooth mesh (which should be by v2.0), people should be able to exchange information anonymously and self-organize in big cities to do whatever they feel like doing.
I am planning to gradually increase support for various operating systems and hardware architectures with each new version of Pepe:
|32-bit x86||v0.4||probably never||v0.6||v0.4||v1.8|
|64-bit ARM||v0.4||v0.8||no idea when||v0.6||v1.8|
|32-bit ARM||v0.2||N/A||no idea when||v0.6||v1.8|
|64-bit RISC-V||v1.2||N/A||N/A||no idea when||N/A|
Because there is no feasible way to install applications outside of Apple’s App Store without jailbreaking (which might still be illegal in some jurisdictions) the device you “own”.
For the most part, in Go. I spent quite some time choosing between C and Go and decided to go with the latter for multiple reasons:
I love Rust, but its ecosystem is still too immature, so practicality wins here again. I have been using C++ for almost half of my life, and I still cannot stand this Stroustrup’s monster. As for Java, I have big plans for running Pepe on SBCs, and they do not tend to have much RAM, so no way.
The GNU AGPL is recommended by the FSF for any software that will commonly be run over a network. This is also a nice opportunity to legally protect our community from for-profit organizations that generally avoid AGPL software like the plague and government agencies that will have to either break the license terms in order to mess with the network or be content with a research pool of nodes that do not drop requests from modified clients.
It is strong anonymity and low bandwidth overhead. But latency will be measured in seconds, yes.
It sort of is, actually. Each thread, each board, and the metaboard are represented with a Merkle-CRDT data structure which is basically a blockchain but with multiple branches.
There is very little evidence that unidirectional tunnels are superior from the security perspective, so I decided to go with bidirectional ones for practicality reasons.
Instead of mainline Kademlia, I am planning to use a slightly modified version of R⁵N as it outperforms Kademlia in restricted networks and is more resistant to some of the known vulnerabilities such as poisoning and Sybil attacks. But this has not been finalized yet.
You cannot, sorry about that. Feel free to reach out to me publicly on Github instead. I am most probably not going to spend time answering random questions though, please stick to the bug report and PEP formats (or fixing/implementing them in pull requests).