>_ Analyst Engineering

You Don't Understand the System Until You Test It

Cover for an article on learning a payment system by testing it end to end, from PAIN.001 to pain.002.

A new analyst joined my team and asked for the documentation so they could “get up to speed.” I gave them the confluence space, the sequence diagrams, the API specs, all of it. Two weeks later they still could not tell me what happened when a payment was rejected for a closed account. Not because they had not read the docs, but because the docs described the system as someone wished it worked, not as it actually did. So I sat them down and said: submit a payment, then follow it. We sent one PAIN.001 through the system and watched it move. In an afternoon they learned more than the diagrams had taught them in two weeks.

Here is the thing nobody tells you when you start: you do not really understand a system until you have tested it. Documentation tells you the intended behavior. Testing shows you the real behavior, the timing, the error codes, the half-second where a status sits in limbo, the message a user actually sees when something breaks. Running real and broken inputs through the system is the fastest way to learn how it works and how the user experiences it. This is where analysis and quality stop being separate jobs.

This article walks one payment from submission to settlement status, through every hop, the way I teach it: the API call, the Kafka event, the database, the logs, the status endpoint, and the callback that carries the pain.002. Happy path first, then the unhappy path, because the unhappy path is where you learn the most.

Why diagrams and specs are not enough

A sequence diagram shows you the boxes and the arrows. It does not show you that the second service takes 400 milliseconds to respond, that a rejection comes back with reason code AC04 and not the AC01 the spec implied, or that the status sits at “received” for a beat before it moves to “accepted.” It does not show you what the customer sees while they wait.

When you test, all of that becomes visible. You feel the latency. You read the actual error payload. You see which field the system trusts and which it ignores. You discover that the “optional” field in the spec is in fact required by the downstream bank. None of that is in the diagram, and all of it shapes requirements and user experience.

This is the bridge between analysis and quality. The analyst who tests is not just checking that the build matches the spec. They are learning the system deeply enough to write better requirements next time, and to spot the gaps the spec never covered.

Set up the flow: one payment, followed end to end

Start with a real instruction. We submit a PAIN.001, the customer credit transfer initiation. If you want the full breakdown of why that is a pain message and what it becomes downstream, see PAIN vs pacs in ISO 20022. For testing, the point is simple: one PAIN.001 enters the system, and we follow it through every service until a pain.002 comes back telling the customer what happened.

The trick that makes this repeatable is daisy chaining requests with environment variables. Each request captures the identifiers the next one needs. You POST the payment, save the payment id and the UETR, then every later check reuses them. The chain mirrors the life of one transaction.

// 1) POST the PAIN.001 to the ingestion API
// Capture the identifiers the rest of the chain needs.
bru.setEnvVar("paymentId", res.body.paymentId);
bru.setEnvVar("uetr", res.body.uetr);

expect(res.status).to.equal(202);
expect(res.body.status).to.equal("RCVD");

Now the identifiers live in {{paymentId}} and {{uetr}}, and every downstream request can use them.

Step 1: confirm the event reached Kafka

The ingestion service does not process the payment itself. It validates the message and publishes an event, say “payment.received”, to a Kafka topic. So the first downstream check is the event.

Spin up a test consumer with kafkajs, subscribe to the topic, and look for the event carrying your UETR. If it is there with the right type and payload, the handoff worked. If it is missing or malformed, you have learned something the diagram could not tell you: the failure is upstream, in ingestion, not in the processor everyone assumed was broken.

// 2) Assert the event was published, keyed by the UETR we captured
const event = await waitForEvent("payments.received", { uetr });
expect(event).to.exist;
expect(event.value.uetr).to.equal(bru.getEnvVar("uetr"));

Watching the event flow is also how you learn the system’s real shape. You see which services are event-driven and which are synchronous, and you feel where the asynchronous gaps are, the places a user might refresh and see nothing yet.

Step 2: check the database state

Events move the payment, but state lives in the database. The processing service writes a row to the payments table. Query it directly through your cloud data API (for example the AWS RDS Data API or a DynamoDB call) and assert the stored state matches what you expect.

// 3) Query the payments table for the row this flow created
const row = await db.query(
  "select status, reason_code from payments where uetr = :uetr",
  { uetr }
);
expect(row.status).to.equal("ACCP");

Reading the database during a test teaches you the data model in a way no entity diagram does. You see which fields are populated when, which start null and fill in later, and what a partially processed payment actually looks like at rest. That knowledge makes your next set of requirements sharper.

Step 3: poll the status endpoint, POST then GET

The customer does not read your database. They poll a status endpoint. So model what they do: after the POST, GET the payment and watch the status move.

// 4) Poll the status the way the customer's channel would
// GET /payments/{{paymentId}}
expect(res.body.status).to.be.oneOf(["RCVD", "ACCP", "ACSP"]);

This POST then GET rhythm is where you feel the user experience. How long until the status is meaningful? Does it jump straight to accepted, or pass through an intermediate state a user might find confusing? Is “accepted” the same as “settled,” and does the UI make that clear? You only ask these questions once you have watched the status move with your own eyes.

Step 4: confirm observability tells the truth

A payment that works but cannot be traced is a problem waiting to happen. So check the logs. Query your observability tool, Splunk for example, through its API, filtering on the correlation id or UETR you have been carrying since step one.

// 5) Confirm each hop logged the transaction, no errors
const events = await splunk.search(
  `index=payments uetr=${uetr} | stats count by service, level`
);
expect(events.some(e => e.level === "ERROR")).to.equal(false);

Now you learn the system’s nervous system. Which services log richly and which are silent. Whether the correlation id actually flows end to end, or breaks at a service boundary so a production incident would leave you blind. That is a real finding, and you only get it by looking.

Step 5: assert the callback returns the expected pain.002

Finally, the payload that closes the loop. The system sends a callback containing the pain.002, the customer payment status report. Assert it carries the right group status and the same identifiers you started with.

// 6) The pain.002 callback should confirm acceptance, same UETR
expect(res.body.orgnlGrpInfAndSts.grpSts).to.equal("ACCP");
expect(res.body.uetr).to.equal(bru.getEnvVar("uetr"));

When the pain.002 comes back ACCP with your UETR intact, the happy path is proven, not on paper, but end to end through every service.

The unhappy path is where you learn the most

The happy path tells you the system works. The unhappy path tells you how it behaves when reality goes wrong, and that is where the real learning lives.

Submit the same flow with a broken input. Use a closed beneficiary account, or an amount over the limit. Then follow it and watch where and how it fails:

  • The POST may still return 202 “received,” because validation happens downstream. Already a UX insight: the customer is told “received” before anything is actually checked.
  • The Kafka event is now “payment.rejected” rather than “payment.received,” or the processor emits a rejection event. You learn which service owns the decision.
  • The database row shows status REJECTED with reason code AC04 for a closed account. You learn the real reason codes the system uses, not the ones the spec guessed.
  • The GET returns RJCT, and you see exactly what the customer’s channel has to render.
  • The pain.002 callback comes back RJCT with a reason code. Now you know what the customer actually finds out, and whether it is clear enough to act on.
  • Splunk shows the rejection logged, with the reason, at the service that made the call. Or it does not, which is a finding worth raising.

Every edge case you try teaches you something the documentation left out. A timeout mid-flow. A duplicate submission with the same UETR. A valid message with an unsupported currency. Each one shows you a behavior, a status, and a message a real user could hit, and each one makes you a better analyst because now you know the system, not the story about the system.

What this does for your requirements

Testing like this changes how you write requirements. You stop writing “the system shall reject invalid payments” and start writing “an invalid beneficiary account returns a pain.002 with status RJCT and reason code AC04 within two seconds, surfaced to the customer as a clear message.” You write acceptance criteria that match reality because you have seen reality.

This is the whole argument for bridging analysis and quality. The same person who can trace a flow across microservices and read the logs and query the database writes better specs, catches gaps earlier, and understands the user experience at a level no diagram delivers. If you want the deeper toolkit for this kind of work, it is in the guides.

The takeaway

Read the documentation, study the diagrams, then put them down and submit a payment. Follow it through the API, the event, the database, the status endpoint, the logs, and the callback. Run the happy path until it is boring, then break it on purpose and learn how it fails.

You will understand the system, and the people who use it, in a way that no specification ever taught you. That is the bridge between analysis and quality, and it is built one tested payment at a time.

Frequently asked questions

Why is testing better than documentation for understanding a system?

Documentation tells you how the system is supposed to behave. Testing shows you how it actually behaves, including the timing, the error messages, and the edge cases the diagram never mentioned. Running real and broken inputs through the system is the fastest way to learn how it really works and how the user experiences it.

How do you test a payment flow end to end across microservices?

Submit a real payment instruction such as a PAIN.001, then follow it through every hop: confirm the API response, check the event published to Kafka, query the database for the stored state, poll the status endpoint, verify the logs in your observability tool, and assert the callback returns the expected pain.002. You test the whole chain, not one service in isolation.

What is daisy chaining API requests in testing?

Daisy chaining means running a sequence of requests where each one feeds the next. You POST a payment, capture the returned payment id and UETR into environment variables, then reuse those variables in the follow-up GET, the Kafka check, the database query, and the log search. The chain mirrors the real flow of a single transaction.

How do you verify a Kafka event was published during a test?

Subscribe to the topic with a test consumer (for example using kafkajs), filter by the UETR or correlation id you captured from the POST response, and assert the event exists with the expected type and payload. If the event is missing or malformed, the failure is upstream of any downstream service.

What is the difference between happy path and unhappy path testing?

Happy path testing confirms a valid input flows through and settles as expected. Unhappy path testing feeds invalid or edge-case inputs, such as a closed beneficiary account or an amount over a limit, and checks that the system rejects them correctly, with the right reason code, the right status, and a clear message back to the user.

Newsletter

Subscribe

Practical, no-fluff playbooks for technical analysts who analyze, code, test, and support. New articles straight to your inbox.

No spam. Unsubscribe anytime.