Justin Spencer: ZeroMQ The Guide by Pieter Hintjens

ZeroMQ - The Guide

ZeroMQ looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply.
ZeroMQ sockets are the world-saving superheros of the networking world.
More generally, "zero" refers to the culture of minimalism that permeates the project. We add power by removing complexity rather than by exposing new functionality.
We assume you care about scale, because ZeroMQ solves that problem above all others.
Programming is science dressed up as art because most of us don't understand the physics of software and it's rarely, if ever, taught. The physics of software is not algorithms, data structures, languages and abstractions. These are just tools we make, use, throw away. The real physics of software is the physics of people--specifically, our limitations when it comes to complexity, and our desire to work together to solve large problems in pieces.
This is the science of programming: make building blocks that people can understand and use easily, and people will work together to solve the very largest problems.
We live in a connected world, and modern software has to navigate this world. So the building blocks for tomorrow's very largest solutions are connected and massively parallel.
Even connecting a few programs across a few sockets is plan nasty when you start to handle real life situations.
Connecting computers is so difficult that software and services to do this is a multi-billion dollar business.
To fix the world, we needed to do two things. One, to solve the general problem of "how to connect any code to any code, anywhere". Two, to wrap that up in the simplest building blocks that people could understand and use easily.
ZeroMQ doesn't know anything about the data you send except its size in bytes. That means you are responsible for formatting it safely so that applications can read it back.
When you receive string data from ZeroMQ in C, you simply cannot trust that it's safely terminated. Every single time you read a string, you should allocate a new buffer with space for an extra byte, copy the string, and terminate it properly with a null.
ZeroMQ strings are length-specified and are sent on the wire without a trailing null.
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages.
Making a TCP connection involves to and from handshaking that takes several milliseconds depending on your network and the number of hops between peers.
The alternative to synchronization is to simply assume that the published data stream is infinite and has no start and no end.
Write nice code. Ugly code hides problems and makes it hard for others to help you. Use consistent indentation and clean layout. Write nice code and your world will be more comfortable.
Test what you make as you make it. When your program doesn't work, you should know what five lines are to blame.
When you find that things don't work as expected, break your code into pieces, test each one, see which one is not working.
Make abstractions (classes, methods, whatever) as you need them. If you copy/past a lot of code, you're going to copy/past errors, too.
Classy programmers share the same motto as classy hit men: always clean-up when you finish the job.
Memory leaks are one thing, but ZeroMQ is quite finicky about how you exit an application.
If you are opening and closing a lot of sockets, that's probably a sign that you need to redesign your application.
Do not try to use the same socket from multiple threads.
Blocking I/O creates architectures that do not scale well. But background I/O can be very hard to do right.
It is incredibly wasteful for teams to be building this particular [message passing] wheel over and over.
It turns out that building reusable messaging systems is really difficult, which is why few FOSS projects ever tried, and why commercial messaging products are complex, expensive, inflexible, and brittle.
This is ZeroMQ: an efficient, embeddable library that solves most of the problems an application needs to become nicely elastic across a network, without much cost.
Traditional network programming is built on the general assumption that one socket talks to one connection, one peer. There are multicast protocols, but these are exotic. When we assume "one socket = one connection", we scale our architectures in certain ways. We create threads of logic where each thread work with one socket, one peer. We place intelligence and state in thse threads.
In the ZeroMQ universe, sockets are doorways to fast little background communications engines that manage a whole set of connections auto-magically for you.
ZeroMQ sockets carry messages, like UDP, rather than a stream of bytes as TCP does. A ZeroMQ message is length-specified binary data.
ZeroMQ is not a neutral carrier: it imposes a framing on the transport protocols it uses. This framing is not compatible with existing protocols, which tend to use their own framing.
The built-in core ZeroMQ patterns are:

Request-reply, which connects a set of clients to a set of services. This is a remote procedure call and task distribution pattern.
Pub-sub, which connects a set of publishers to a set of subscribers. This is a data distribution pattern.
Pipeline, which connects nodes in a fan-out/fan-in pattern that can have multiple steps and loops. This is a parallel task distribution and collection pattern.
Exclusive pair, which connects two sockets exclusively. This is a pattern for connection two threads in a process, not to be confused with "normal" pairs of sockets.

Too many static pieces are like liquid concrete: knowledge is distributed and the more static pieces you have, the more effort it is to change the topology.
A bridge is a small application that speaks one protocol at one socket, and converts to/from a second protocol at another socket. A protocol interpreter, if you like.
Processes, we believe, should be as vulnerable as possible to internal errors, and as robust as possible against external attacks and errors. To give an analogy, a living cell with self-destruct if it detects a single internal error, yet it will resist attack from the outside by all means possible.
Assertions are absolutely vital to robust code; they just have to be on the right side of the cellular wall. And there should be such a wall. If it is unclear whether a fault is internal or external, that is a design flaw to be fixed.
Real code should do error handling on every single ZeroMQ call.
Any long-running application has to manage memory correctly, or eventually it'll use up all available memory and crash.
To make utterly perfect MT programs (and I mean that literally), we don't need mutexes, locks, or any other form or inter-thread communication except messages sent across ZeroMQ sockets.
If there's one lesson we've learned from 30+ years of concurrent programming, it is: just don't share state.
The list of weird problems that you need to fight as you write classic shared-state MT code would be hilarious if ti didn't translate directly into stress and risk, as code that seems to work suddenly fails under pressure.
Some widely used models, despite being the basis for entire industries, are fundamentally broken, and shared state concurrency is one of them.
This is a recurring theme with ZeroMQ: the world's problems are diverse and you can benefit from solving different problems each in the right way.
Getting applications to properly shut down when you send them Ctrl-C can be tricky.
Most people who speak of "reliability" don't really know what they mean. We can only define reliability in terms of failure. That is, if we can handle a certain set of well-defined and understood failures, then we are reliable with respect to those failures. No more, no less.
So to make things brutally simple, reliability is "keeping things working properly when code freezes or crashes", a situation we'll shorten to "dies".
Heartbeating solves the problem of knowing whether a peer is alive or dead.
The nice thing about progress is how fast it happens when lawyers and committees aren't involved.
This is how we should design complex architectures: start by writing down the contracts, and only then write software to implement them.
Theory is great in theory, but in practice, practice is better.
A good design principle that I use whenever possible is to not invent concepts that are not absolutely essential.
If you make a nontrivial protocol and you expect application to implement it properly, most developers will get it wrong most of the time.
There are three main open source patterns. The first is the large dumping code to break the market for others. This is the Apache Foundation model. The second is tiny teams or small firms building their dream. This is the most common open source model, which can be very successful commercially. The last is aggressive and diverse communities that swarm over a problem landscape. This is the Linux model, and the one to which we aspire with ZeroMQ.
It's hard to overemphasize the power and persistence of a working open source community. There really does not seem to be a better way of making software for the long term.
Software dies, but community survives.
My main takeaway from a long career of projects of every conceivable format is: if you want to build truly large-scale and long-lasting software, aim to build a free software community.
Architecture is the art and science of making large artificial structures for human use. If there is one thing I've learned and applied successfully in 30 years of making larger and larger software systems, it is this: software is about people. Large structures in themselves are meaningless. It's how they function for human use that matters. And in software, human use starts with the programmers who make the software itself.
The core problems in software architecture are driven by human psychology, not technology.
One of the tenets of Social Architecture is that how we organize is more significant than who we are. The same group, organized differently, can produce wholly different results.
Ordinary people, well connected, can far outperform a team of experts using poor patterns.
The two most important psychological elements are that we're really bad at understanding complexity and that we are so good at working together to divide and conquer large problems. We're highly social apes, and kind of smart, but only in the right kind of crowd.
Stupidity: our mental bandwidth is limited, so we're all stupid at some point. The architecture has to be simple to understand. This is the number one rule: simplicity beats functionality, every single time. If you can't understand an architecture on a cold gray Monday morning before coffee, it is too complex.
Selfishness: we act only out of self-interest, so the architecture must create space and opportunity for selfish acts that benefit the whole. Selfishness is often indirect and subtle.
Laziness: we make lots of assumptions, many of which are wrong. We are happiest when we can spend the least effort to get a result or to test an assumption quickly, so the architecture has to make this possible. Specifically, that means it must be simple.
Jealousy: we're jealous of others, which means we'll overcome our stupidity and laziness to prove others wrong and beat them in competition. The architecture thus has to create space for public competition based on fair rules that anyone can understand.
Fear: we're unwilling to take risks, especially if it makes us look stupid. Fear of failure is a major reason people conform and follow the group in mass stupidity. The architecture should make silent experimentation easy and cheap, giving people opportunity for success without punishing failure.
Reciprocity: we'll pay extra in terms of hard work, even money, to punish cheats and enforce fair rules. The architecture should be heavily rule-based, telling people how to work together, but not what to work on.
Conformity: we're happiest to conform, out of fear and laziness, which means if the patterns are good, clearly explained and documented, and fairly enforced, we'll naturally choose the right path every time.
Pride: we're intensely aware of our social status, and we'll work hard to avoid looking stupid or incompetent in public. The architecture has to make sure every piece we make has our name on it, so we'll have sleepless nights stressing about what others will say about our work.
Greed: we're ultimately economic animals, so the architecture has to give us economic incentive to invest in making it happen. Maybe it's polishing our reputation as experts, maybe it's literally making money from some skill or component. It doesn't matter what it is, but there must be economic incentive. Think of architecture as a market place, not an engineering design.
The truth about human nature is not that pretty. We're not really angels, nor devils, just self-interested winners descended from a billion-year unbroken line of winners. In business, marriage, and collective works, sooner or later, we either stop caring, or we fight and we argue.
Long-term survival means enduring the bad times, as well as enjoying the good ones.
The [software] license we choose modifies the economics of those who use our work.
Your goal as leader of a community is to motivate people to get out there and explore; to ensure they can do so safely and without disturbing others; to reward them when they make successful discoveries; and to ensure they share their knowledge with everyone else.
Plan your own retirement well before someone else decides you are their next problem.
You need a goal that's crazy and simple enough to get people out of bed in the morning. Your community has to attract the very best people and that demands something special.
Your work must be beautiful, immediately useful, and attractive. Your contributors are users who want to explore just a little beyond where they are now. Make it simple, elegant, and brutally clean. The experience when people run or use your work should be an emotional one. They should feel something, and if you accurately solved even just one big problem that until then they didn't quite realize they faced, you'll have a small part of their soul.
It [your project] must be easy to understand, use, and join. Too many projects have barriers to access.
A group of like-minded experts cannot explore the problem landscape well. They tend to make big mistakes. Diversity beats education any time.
Transparency is essential to get trust, which is essential to get scale. By forcing every single change through a single transparent process, you build real trust in the results.
Another cardinal sin that many open source developers make is to place themselves above others.
You job, as founder of a project, is not to impose your vision of the product over others, but to make sure the rules are good, honest, and enforced.
One of the saddest myths of the knowledge business is that ideas are a sensible form of property. It's medieval nonsense that should have been junked along with slavery, but sadly it's still making too many powerful people too much money.
Ideas are cheap.
What works today often won't work tomorrow, yet structures become more solid, not more flexible, over time.
We humans are really good at specialization. Asking us to be really good at two contradictory things reduces the number of candidates sharply, which is a bad thing for any project.
Going very fast in the wrong direction is not just useless, it's actively damaging.
Curious observation: people who thrive in complex situations like to create complexity because it keeps their value high.
Developers should not be made to feel stupid by their tools.
There are several reasons for not logging ideas, suggestions, or feature requests. In our experience, these just accumulate in the issue tracker until someone deletes them. But more profoundly, when we treat all changes as problem solutions, we can prioritize trivially.
Promoting the most active and consistent maintainers is good for everyone.
One of git's most popular features is its branches.
I'm a great believer in popular wisdom, but sometimes you have to recognize mass delusion for what it is.
Now, perhaps historians will feel robbed, but I honestly can't see that the historical minutiae of who changed what, when, including every branch and experiment, are worth any significant pain or friction.
My own opinion is that history will judge git branches and patterns like git-flow as a complex solution to imaginary problems inherited from the days of Subversion and monolithic repositories.
The simpler, the better.
Circumstantial evidence is thus that branches lead to more complexity than forks.
The smaller and more rapid the delivery, the better.
The smoother the learning curve, the better.
Evidence definitely shows that learning to use git branches is complex.
For most developers, every cycle spent learning git is a cycle lost on more productive things.
The lower the cost of failure, the better.
Branches demand more perfection from developers because mistakes potentially affect others. This raises the cost of failure. Forks make failure extremely cheap because literally nothing that happens in a fork can affect others not using that fork.
The less need for up-front coordination, the better.
The more you can scale a project, the better.
The less surprising, the better.
Sometimes better ways of working are surprising at first.
The more tangible the rewards, the better.
The more a model can survive conflict, the better.
Like it or not, people fight over ego, status, beliefs, and theories of the world. Challenge is a necessary part of science.
The stronger the isolation between production code and experiment, the better.
The more visible our work, the better.
Git is not an easy tool to master.
Innovation really just means solving problems more cheaply.
Software engineers don't like the notion that powerful, effective solutions can come into existence without an intelligent designer actively thinking things through.
In the dominant theory of innovation, brilliant individuals reflect on large problem sets and then carefully and precisely create a solution. [...] Look more closely, however, and you will see that the fact's don't match. History doesn't show lone inventors. It shows luck people who steal or claim ownership of ideas that are being worked on by many.
Here thus is an alternative theory of innovation:

There is an infinite problem/solution terrain.
This terrain changes over time according to external conditions.
We can only accurately perceive problems to which we are close.
We can rank the cost/benefit economics of problems using a market for solutions.
There is an optimal solution to any solvable problem.
We can approach this optimal solution heuristically, and mechanically.
Our intelligence can make this process faster, but does not replace it.

Individual creativity matters less than process. Smarter people may work faster, but they may also work in the wrong direction. It's the collective vision of reality that keeps us honest and relevant.
We don't need road maps if we have a good process. Functionality will emerge and evolve over time as solutions compete for market share.
We don't invent solutions so much as discover them. All sympathies to the creative soul. It's just an information processing machine that likes to polish its own ego and collect karma.
Intelligence is a social effect, though it feels personal. A person cut off from others eventually stops thinking. We can neither collect problems nor measure solutions without other people.
The size and diversity of the community is a key factor. Large, more diverse communities collect more relevant problems, and solve them more accurately, and do this faster, than a small expert group.
So, when we trust the solitary experts, they make classic mistakes. They focus on ideas, not problems. They focus on the wrong problems. They make misjudgments about the value of solving problems. They don't use their own work.
Living products consist of long series of patches, applied one atop the other.
The most popular design process in large business seems to be Trash-Oriented Design, or TOD. TOD feeds off the belief that all we need to make money are great ideas.
Ideas are cheap. No exceptions. There are no brilliant ideas.
The starting point for a good design process is to collect real problems that confront real people. The second step is to evaluate these problems with the basic question, "How much is it worth to solve this problem?" Having done that, we can collect that set of problems that are worth solving.
Good solutions to real problems will succeed as products. Their success will depend on how good and cheap the solution is, and how important the problem is.
Complexity-Oriented Design is characterized by a team obsessively solving the wrong problems in a form of collective delusion. COD products tend to be large, ambitious, complex, and unpopular.
It is insanely hard for engineers to stop extending a design to cover more potential problems.
Making stuff that you don't immediately have a need for it pointless.
Problems are not equal. Some are simple, and some are complex. Ironically, solving the simpler problems often has more value to more people than solving the really hard ones. So if you allow engineers to just work on random things, they'll mostly focus on the most interesting but least worthwhile things.
Engineers and designers love to make stuff and decoration, and this inevitably leads to complexity. It is crucial to have a "stop mechanism", a way to set short, hard deadlines that force people to make smaller, simpler answers to just the most crucial problems.
This process [Simplicity-Oriented Design] starts with a realization: we do not know what we have to make until after we start making it. Coming up with ideas or large-scale designs isn't just wasteful, it's a direct hindrance to designing the truly accurate solutions.
You need to keep mobile, pack light, and move fast.
A perfect "patch" solves a problem with zero learning required by the user.
To get the most out of SOD the designer has to use the product continuously, from day one, and develop his or her ability to smell out problems such as inconsistency, surprising behavior, and other forms of friction.
Design is about removing friction in the use of a product.
In any project, we need some kind of reward to make it worth continuing each day.
It's a management truism: if someone in your organization is irreplaceable, get rid of him or her.
There is a simple cure for burnout that works in at least some cases: get paid decently for your work.
Never design anything that's not a precise minimal answer to a problem we can identify and have to solve.
The control of a large force is the same principle as the control of a few mean: it is merely a question of dividing up their numbers.
The Benevolent Tyrant divides large problems into smaller ones and throws them at groups to focus on. The Benevolent Tyrant constructs a supply chain that starts with problems, and results in usable solutions. She is ruthless about how the supply chain works, but does not tell people what to work on, nor how to do their work.
The ideal team consists of two sides: one writing code, and one providing feedback.
The accuracy of knowledge comes from diversity.
Perfection precludes participation.
Make no plans. Set goals, develop strategies and tactics.
If you know the enemy and know yourself, you need not fear the result of a hundred battles.
He will win whose army is animated by the same spirit throughout all its ranks.
After crossing a river, you should get far away from it.
Code, like all knowledge, works best as collective--not private--property.
Water shapes its course according to the nature of the ground over which it flows.
Physical closeness is essential for high-bandwidth communications.
Pain is not, generally, a good sign.
People should feel joy in their work.
Never interrupt others when they are making mistakes.
The Hangman knows that we learn only by making mistakes, and she gives others copious rope with which to learn. She only pulls the rope gently, when it's time. A little tug to remind the other of their precarious position. Allowing others to learn by failure gives the good reason to stay, and the bad excuse to leave. The Hangman is endlessly patient, because there is no shortcut to the learning process.
Keeping the public record may be tedious, but it's the only way to prevent collusion.
No one really reads the archives, but the simple possibility stop most abuses.
When a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully.
Deadlines bring people together and focus the collective mind.
An external enemy can move a passive team into action.
When people argue or complain, just write them a Sun Tzu quotation.
Mistakes in slow motion are often harder to see (or rather, easier to rationalize away).
A good software architecture depends on contracts, and the more explicit they are, the better things scale.
Start simple, and develop your specifications step-by-step. Don't solve problems you don't have in front of you.
Use very clear and consistent language.
Try to avoid inventing concepts.
Make nothing for which you cannot demonstrate an immediate need. Your specification solves problems; it does not provide features. Make the simplest plausible solution for each problem that you identify.
Implement your protocol as you build it, so that you are aware of the technical consequences of each choice.
Test your specification as you build it on other people. Your best feedback on a specification is when someone else tries to implement it without the assumptions and knowledge that you have in your head.
Only use constructs that are independent of programming language and operating system.
The point about a written specification is that no matter how weak it is, it can be systematically improved. By writing down a specification, you will also spot inconsistencies and gray areas that are impossible to see in code.
My advice when writing protocol specs is to learn and use a formal grammar. It's just less hassle than allowing others to interpret what you mean, and then recover from the inevitable false assumptions. The target of your grammar is other people, engineers, not compilers.
Protocol designers who don't separate control from data tend to make horrid protocols, because the trade-offs in the two cases are almost totally opposed.
Use a profiler. There's simply no way to know what your code is doing until you've profiled it for function counts and for CPU cost per function. When you find your hot spots, fix them.
Eliminate memory allocations. The heap is very fast on a modern Linux kernel, but it's still the bottleneck in most naive codecs. [...] Use local variables (the stack) instead of the heap where you can.
Know your data. The best compression techniques require knowing about the data.
Do not invent concepts. The job of a designer is to remove problems, not add features.
A protocol has at least two levels:

How we represent individual messages on the wire.
How messages flow between peers, and the significance of each message.

The future is clearly wireless, and while many big businesses live by concentrating data in their clouds, the future doesn't look quite so centralized. The devices at the edges of our networks get smarter every year, not dumber.
A truly wireless world would bypass all central censorship. It's how the internet was designed, and it's quite feasible, technically (which is the best kind of feasible).
To understand how WiFi performs technically, you need to understand a basic law of physics: the power required to connect two points increases according to the square of the distance.
Mesh [networking] removes the access point completely, at least in the imaginary future where it exists and is widely used. Devices talk to each other directly, and maintain little routing tables of neighbors that let them forward packets.
Mesh will emerge and I'd bet on 802.11s being widely available in consumer electronics by 2020 or so.
Network discovery is finding our what other peers are on the network. Service discovery is learning what those peers can do for us.
The star topology is slowing dying and being replaced by clouds of clouds.
In a world of trillions of nodes, the ones you talk to most are the ones closest to you. This is how it works in the real world and it's the sanest way of scaling large-scale architectures.
One nice things about software is to brute-force your way through the learning experience. As long as we're happy to throw away work, we can learn rapidly simply by trying things that may seem insane from the safety of the armchair.
The proper use of assertions is one of the hallmarks of a professional programmer.
Our confirmation bias as creators makes it hard to test our work properly. We tend to write tests to prove the code works, rather than trying to prove it doesn't
To accept that we're fallible, and then to learn how to turn that into profit rather than shame is one of the hardest intellectual exercises in any profession. We leverage our fallibility by working with others and by challenging our own work sooner, not later.
Assertions are not a form of error handling. They are executable theories of fact. The code asserts, "At this point, such and such must be true" and if the assertion fails, the code kills itself.
The faster you can prove code incorrect, the faster and more accurately you can fix it.
Being able to fully test the real behavior of individual components in the laboratory can make a 10x or 100x difference to the cost of your project. That confirmation bias engineers have to their own work makes up-front testing incredibly profitable, and late-stage testing incredibly expensive.
Lesson is, test upfront so that when you plug the thing in, you know precisely how it's going to behave. If you haven't tested it upfront, you're going to be spending weeks and months in the field ironing out problems that should never have been there.
Brutal is good because it forces the design to a "good" or "bad" decision rather than a fuzzy "should work but to be honest there are a lot of edge cases so let's worry about it later".

Justin Spencer

Pages

20171230

ZeroMQ The Guide by Pieter Hintjens

No comments:

Post a Comment