Data exchange

Coupling 2 products often comes with a plain old requirement.

As a user, I want Product A to exchange data with Product B

Don’t be fool by the apparent simplicity of such a requirement. It deserves a lot more than it seems to at the first glance. Mainly because we talk about data and because it is part of the coupling API. And mainly because those particular 2 traits are:

Change

Likely to change. Data, as usage, will evolve. Period.

Evolve

Subtle to evolve. Remember data exchange is a contract between 2 parts. Contract implies trade-off. Trade-off implies negotiation. Negotiation takes time. And outcome is not always guaranteed…

Maintain

Easily to make it wrong and costly/impossible to maintain in that case.

Doing it right will spare a lot of time/resources/budget in the future (development vs maintenance ratio, …). But what doing it right really means in that matter? Let rephrase the initial requirement materializing reasonably tacit expectations:

As a user, I want Product A to exchange data with Product B, in a way which is:

Easy to setup

Easy to operate

Easy to debug

Resilient to data changes

Resilient to infrastructure changes

Of course, ideal world. Most of the time easy and resilient notions are subjective ones. And obviously, it is more complex to check all of those as it requires more design and/or work to achieve. So, one may be tempted to strike through some of the items. Pay attention to this stage.

Striking through is an easy move, rollbacking it few months later is not.

Every seasoned developers has encountered such a situation. For example, develop xxx facility for Windows and Linux. During budget fight aka product owner negotiation, most of this ends up with “Should it be cheaper to only develop for Windows (or Linux)?”. And, obviously, it is often the case (especially if we consider that we should develop and test on every OS. Not every language is smart enough to guarantee you can write once, and deploy everywhere. If you are lucky enough to pick this kind of language, fight for keeping it 😉). Main issue comes from the answer, a “Yes, …” alike one. Which activates the well-known trimming all that I do not want to hear or to be aware of brain bias, leading to the tense form “Is it cheaper? Yes”. And, yes you foresee it, meeting outcome ends up with option striking through. And, guess what. Few weeks/months pass. And it is now time to rub out the strike. And here comes painful state of play. Tackling this feature will be costly… Should we blame product owner for asking question? Should we blame team for poor answer? Should we blame all to make that decision? Answer is none of them. Because blaming is time consuming and does not deliver any value. And it is not how a team work. We, as a team, failed. And now, we do have customer expectation. Our work, as a team, is to tackle this requirement. Period. This should not prevent us to be aware. To be careful. To be honest.

Remember. There is no little cheap moves. Everything has a cost. The question is not IF you will pay, it is WHEN.

And, as seasoned developers, you know better than everyone, that it is far easier to pay upstream rather than downstream. It is all technical debt is about, isn’t it? And debt is unlikely to resorb itself.

Ensure choice ramifications and consequences are properly encompassed by every stakeholders, and committed accordingly. By the way, it is the goal of this design stage.

Prior to tackle those different traits, 2 questions one should think about:

WHAT data are made of?
WHY data are used for?

If those questions are not materialized yet, do everyone involved a favor: write them down right now.

Be honest and accurate when answering those questions. E.g., when you are crafting an autonomous driving solution, which implies sensor data exchange between two actors of the simulation, design and size the system in a way it can fit the purpose. Being able to tackle sizable debug data structure along the way to ease diagnoses is for sure helpful but not the primary intent. If you choose to design your system for this instead, you miss the target. There is rarely a one fits them all solution. What appears to be smart to exchange sizable data chunk could be an anti-pattern to exchange tiny data chunk.

Design first and foremost according to customer requirement.

Then, adapt accordingly. When we have such discrepancy between data or usage, it is likely that we will end up with two ways of doing it instead of one which is likely to poorly accommodate every use cases. Split requirements. Remember a requirement is persona-based (some stress it out by starting requirement with “As a…” like we did above). And explore the field of possibilities.

If you are tempting to do so, refrain from making loud assumptions about technical stack. It is the HOW, and this has to be dealt with afterwards. As starting by picking a technical stack will narrow your reflection, and brings more biases than keys.

It is especially true when writing functional user story. Do not extrapolate. Do not bypass. Prefer exchanging data through human-readable format over exchanging data through JSON format, but if this is explicitly specified by the customer.

Reasoning this way is not especially tied to data. This principle is called agnosticity and it supports building sustainable systems. Do not introduce context/knowledge that is not necessary to make decision. Even if aware, pretend not to be. To avoid shortcuts. Both in discussion and decision making.

It is far – and by far I mean really far – easier to locally improve a generic solution than to promote a highly optimized version to genericity.

One may have notice that we are there, both literally and with respect to our brain storming, without even speaking about a given technical choice or technological stack. Because we do not have to for now. Why should we bother of something we have no need for?

Prior to dig further, let’s assume that it is reasonable to consider three axis when dealing with data:

Format

Data format i.e., how data are packed. Can be human readable (.txt, .json) or binary, structured (.json) or unstructured (.txt), …

Manage

Data management i.e., how data are identified/retrieved ? Can be data ID, file path, …

Exchange

Data exchange e.g., how data are conveyed from source to destination. Can be ID payload, link/filepath payload, full-data payload, …

Let’s pick the different traits of the extended requirements, and explain what one should consider to make smart decisions…