top of page

What is Batch and Stream Processing?

ufuktunca5

In these days we are processing lots of data in our servers. Using REST API on our server is one of our option while we are designing a distributed system. Server takes our request, process it and it will return a response quickly as possible. But this isn’t the only way to proceed a data. Sometimes we need to handle a much big data. Just like getting users analytics, indexing web. pages etc. Let’s inspect some of them.


Batch Processing

Batch processing systems handles large mount of data. But that data has a start and end point. Then produces some data according to input. but usually batch processes are sheculed. So user doesn’t need to see the output. Because these kinds of processes sometimes takes days to process the data.


There are different algorithms for batch processing. One of the most known is MapReduce. MapReduce is published in 2004. Sometimes it is called as “the algorithm that makes Google so massively scalable”. Google was using MapReduce while indexing web pages. Its importance is decreasing day by day, but it is still an algorithm worth learning.


Unix Pipe

While processing a large amount of data with batch processing seperates that processing to stages. We can think like a pipe “|” symbol in unix. Which is helps us to combine the different processes to each other.



In the above example firstly we are listing the list of files. Then we are sorting them reversely. With this way one program’s output become other one’s input.


After Doug McIlroy invented pipes in unix the idea of connecting programs with pipes became part of what is now known as the Unix philosophy. Set of design principles that became popular among the developers. These are the principles which is described in 1978:


1- Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.


2- Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.


3- Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.


4- Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.


These principles enlightened a Agile and Devops movement from 1978. Since today there are slightly changes in this principles. So how does MapReduce is processing large amount data?


MapReduce

As we inspected in unix can seperate it’s processes via pipe. Also it seperrating by it’s stages. It can be implemented with Hadoop Distributed File System, Quantcast File System, GlusterFS etc.



  • In MapReduce firstly we need to read our whole data from our distributed database. Then we will break the. data to records.

  • Secondly we will map the datas. Which is extract their key and values for each piece of data.

  • Thirdly we will sort or shuffle our mapped data.

  • Fourthly we will call the reducer and we are going to merge the data.



While we are doing that processes it can work on hundreds of machines at the same time. As a result we will get our batch process result.


 

Is Batch Processing Enough For Today

For some particular areas yes. But let’s say we want to receive last one week analytics data until right now. At that point we need to wait till the end of the day on batch processing. Because today is not finished. So, solving that problem we need to use Stream Processing.


Stream Processing

In these days stream processing takes the place of batch processing. Because it can process the new datas which we are going to store in our distributed database. So we don’t need to wait till the end of the day to receive our analytic result.


For stream processing we can think like a message systems such as Kafka and RabbitMQ.



Also these kinds of stream processing methods can be used in other ways. Such as:


  • Fraud detection systems need to determine if the usage patterns of a credit card have unexpectedly changed, and block the card if it is likely to have been stolen.

  • Trading systems need to examine price changes in a financial market and execute trades according to specified rules.

  • Military and intelligence systems need to track the activities of a potential aggressor, and raise the alarm if there are signs of an attack.


Comentarios


Contact Us

Turkiye

AMANOS Teknoloji San. Tic. A.S.

Headquarter:

Turaboğlu Sk. Hamdiye Yazgan İş Mrk. No: 4, Kozyatağı, Kadıköy / İSTANBUL

Teknokent R&D Office:

Akademi Mah. Gürbulut Sok. No:67 Selçuklu/Konya

USA

Shift Accelerator Inc.

1209 Orange St. 
Wilmington, DW,19801

Hong Kong

Hescon Electronics HK., Limited.

RM 2902, 29/F Ho King Commercial Centre, 2-16 FA Yuen Street, Mongkok, Kowloon HK

Copyright © 2024 - AMANOS TEKNOLOJI A.S. 

bottom of page