- new
- past
- show
- ask
- show
- jobs
- submit
This isn't about payment technologies, it's not about isolating transactions, it's about scaling the middle layer. What's worse it's not even explained what middle layer does.
No info on how routing works, no info on data synchronization.
Folks just learning Kubernetes and write extremely abstract stuff.
that said: still a nice write up, learning about some of the architectural choices that AMEX makes is definitely insightful (and relavent/useful to what i am working on right now as well!)
The router needs to be shard-aware. It needs to know what data is where based on the request coming in so that it can route accurately. A GLB is DNS. It cannot be shard-aware because all it knows is the FQDN being resolved.
It can be a "router" if all the router needs to know is to resolve to the nearest data center or the nearest CDN. But at that point I have to ask the question - why does one need a cell-based architecture and can't it just be geo-redundant active-active failover across regions.
In any sense, the architecture itself isn't novel or new. It's documented here: https://docs.aws.amazon.com/wellarchitected/latest/reducing-.... It's the go to model if you're running a cloud.
One can have GLBs that do routing. So long as the tenant-to-cell routing tables are consistent, it works fine. And those mappings tend not to change frequently.
Granted. It works really well in practice. It should be noted we haven't actually had the world war the Internet was designed to survive. So we're not entirely clear on the semantics of operations in unusual and unexpected configurations. I would expect DNS to be the first shoe to drop there.
https://news.ycombinator.com/item?id=32023863
https://wso2.com/engineering-platform/developer-platform/doc...
Because of the title I was expecting to read about doing payments with a distributed network, like a terrorist cell network, or something like Hawala. Not (as I infer from other comments) Amex using multiple independent systems.
We use a cellular architecture to help constrain the blast radius of a modular monolith. Each one of our customers lives in exactly 1 cell. Any kind of cross-customer BI/reporting happens through a data warehouse.
The system I work on has such a property and the only real infra style approach is sync replication before responding to a caller and a delayed replica for delete/drop protections (say with a 2hr or more window).
Should also defend for this in your code (be able to reply from your initiation systems also etc)
Some CICS regions, a DB2 and a couple of VSAMs and that's it.
Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.
Last week it required me to take pictures of my face from multiple angles to regain membership privileges. I suspect this may be part Palantir data collection and part Peter Thiel dating service.
This is why I find it best to declare a card stolen right before expiration or after.
Just kidding!
I find the idea quite good, and have to assume that the amount of payment fails they experience due to partitions/outages isn't very high and that the post-payment reconciliation and reclamation process gives them the liberty to rank availability a bit higher than correctness.
One thing that looked a bit shaky was the interplay between the global transaction router's state of knowing which cells can handle a particular payment and the asynchronous distribution of the "failover data", which I presume it needs to know to route correctly. To me that seems to create a window where it might route to the wrong cell due to an outdated routing state.
It also doesn't go into the HA setup of the global transaction router itself.
But still, I kind of like the design.
But if the router sends to the wrong cell the cell will either send it back to be rerouted or it will fail and the router will try again (or report back the failure so upstream can try again I assume)
But what if the cell doesn't know that, and it's holding, for example, a stale account number?