Data costs money to gather and store – and pays no dividend when resting in a silo. Organisations that want to maximise the value of their data must therefore act upon it while it is fresh, relevant, and still accurate. How can they do this most effectively?
In many cases, the key is to break down the processes that they rely on into a series of microservices. These can act upon the data itself, or on the output of other microservices sited further up the chain. To do this, they must be fed a stream of input, often using a pub/sub messaging system.
Pub/sub and real-time data analytics
Originally created at Yahoo but now an Apache project, Pulsar is one such pub/sub messaging system. Pub/sub is shorthand for publish / subscribe, and it describes an underlying service-to-service architecture for connecting a variety of serverless and microservices systems.
When the cloud-based Pulsar platform publishes data (the ‘pub’ half of the equation), every service that is subscribed to its feed (‘sub’) receives it at the same time and can act upon it. The microservices are not working with the primary copy of the data, so can apply whatever transforms they require to achieve a desired outcome without jeopardising the raw data on which that outcome is based.
As Google explains, “Pub/Sub is used for streaming analytics and data integration pipelines to ingest and distribute data. It is equally effective as a messaging-oriented middleware for service integration or as a queue to parallelize tasks”.
The benefits of pub/sub
Pulsar is not the only pub/sub messaging system. Amazon Simple Notification Service (SNS), a selection of Google applications, Ably Realtime, and Azure web PubSub each provide similar outcomes for organisations already using alternative platforms.
The benefits of such a service are manifold, since it effectively decouples the broadcast data from its use, in much the same way that an API exposes a range of hooks from which downstream developers can hang dependent services. The data publisher only needs to concern themselves with the mechanism for publication, not for its eventual use in whichever microservice receives the update. For security, they can even encrypt the data, so long as the microservice developer relying on it has the necessary keys for decryption.
Where pub/sub differs from an API or, say, an XML or JSON feed published to a server from which it must be retrieved, is that pub/sub platforms are push-driven. As services don’t need to continually poll the publishing server for updates, this reduces resource consumption.
Optimised platforms
Because the data is system-agnostic, architectures can be optimised for each task, giving organisations the ability to mix otherwise incompatible hardware, operating environments, databases, and even languages to optimise the development and operating environment.
Moreover, exposing data this way allows for simultaneous updates to multiple remote databases, replication across a distributed network for load balancing, or for several applications or microservices to work with the data at the same time. These applications or services could each be performing discrete, bespoke tasks, which would multiply the value of the published data, or could undertake the same task simultaneously to reduce execution time.
For example, if an individual microservice becomes a pinch point, it can be replicated, or have additional resources assigned to it, without the same level of resources being assigned to any other microservice within the overall system. In this way, organisations can optimise a system – and, as a result, any user experience relying on it – at the lowest possible expense.
Pub/sub and microservices development
Pub/sub can be used to break down a system into component parts. For example, an online store may separate out the various mechanisms for displaying its catalogue, adding items to a basket, rendering the checkout, taking payment, closing the order, and issuing a confirmation and receipt.
If each of these steps was a component within a monolithic application, development would be slower than it might be if they were deployed as a sequence of microservices subscribing to data published by their predecessors. As far as the end user was concerned, the outcome would be unchanged, but the enterprise developing the store would be able to optimise individual services and swap them out at appropriate points within the chain. They could also more easily integrate third-party components and data sources.
Further, as data can be retrieved by multiple services simultaneously, the checkout, confirmation and stock control systems could receive the same data at the same time, even if the confirmation and stock control systems only act upon it when they receive a trigger from the payment service. Likewise, subscribe the catalogue display mechanism to data published by the basket microservice, effectively closing a loop, and it can be used to temporarily adjust stock levels displayed on the store or warn shoppers that ‘x’ customers already have it in their baskets, potentially inducing an impulse purchase.
Although the above example – an online store – relies on each of the services within the chain to operate as intended for a sale to be processed, Merit’s engineers also facilitate the microservices model with pub/sub systems to increase the robustness of less linear applications.
For example, where multiple microservices are subscribed to, and act upon, a single published data source, failure of one microservice should not impact any of the others, which, having no interest in the state of other services sharing the data source, can continue operating unimpeded. This is less likely the case with a monolithic system where the data, both generated and worked upon, remains within a single loop.
The benefits of real-time analytics
The kind of real-time data analytics enabled by pub/sub systems like Pulsar, Amazon SNS and Azure web PubSub allows organisations to support microservices at scale. It doesn’t matter how many microservices are accessing the data – or why – so long as the data continues to flow. This allows for diverse, simultaneous use cases to deliver responsive outcomes.
Whether these outcomes are focused on media analysis, identifying market trends, or delivering an end user experience, like timely notifications and order status updates, it adds a layer of intelligence to the composite system through more effective, timely, and granular analysis and use of a single data stream through replication and distribution. It also means organisations can iterate their applications more quickly, helping them to gain – or maintain – a lead over their rivals.
Merit Group’s expertise in Event Stream Processing
At Merit Group, we work with some of the world’s leading B2B intelligence companies like Wilmington, Dow Jones, Glenigan, and Haymarket. Our data and engineering teams work closely with our clients to build data products and business intelligence tools. Our work directly impacts business growth by helping our clients to identify high-growth opportunities.
Our specific services include high-volume data collection, data transformation using AI and ML, web watching, BI, and customised application development.
The Merit team also brings to the table deep expertise in building real-time data streaming and data processing applications. Our data engineering team has specific expertise in a wide range of data tools including Airflow, Kafka, Python, PostgreSQL, MongoDB, Apache Spark, Snowflake, Tableau, Redshift, Athena, Looker, and BigQuery.
If you’d like to learn more about our service offerings, please contact us here: https://www.meritdata-tech.com/contact-us
Related Case Studies
-
01 /
A Digital Engineering Solution for High Volume Automotive Data Extraction
Automotive products required help to track millions of price points and specification details for a large range of vehicles.
-
02 /
Bespoke Data Engineering Solution for High Volume Salesforce Data Migration
A global market leader in credit risk and ratings needed a data engineering solution for Salesforce data migration.