Introduction to Web Workers

March 06, 2019 0 Comments

Introduction to Web Workers

 

 

Modern web apps are increasingly becoming complex, and with these complexities comes a price to pay at the performance.

JavaScript is a single-threaded language, this means that our code runs only on one thread. Programming languages like Java, C/C++ etc are multi-threaded programming languages. We can separate a particular code to run in another thread.

What is multi-threading?

multi-threading in computing is a system whereby pieces of code in the same address space can run parallel to each other.

Multithreading refers to the ability of an OS to support multiple, concurrent paths of execution within a single process.

Example:

let’s say we have a factory that packages food in small tins. In our assembly consists of:

  • processing the food
  • cleaning the tins
  • filling the tins with food
  • labeling the tins
  • sealing the tins

A computer runs our program by loading from the disk onto the RAM. An entry point is set and jumped to. The CPU begins fetching each instruction executing them one after another. Programs have an address space, an address space is an isolated place in the RAM allocated by the OS where programs are executed to prevent corruption and moving into another program’s space. Also, in the address space heap, stack, code area, and data area are allocated by the OS.

Once it reaches where to create threads, it allocates an address space for the thread inside the main program’s address space. Now, the OS switches from running the main program for a certain amount of time to running the thread. The OS runs the program’s code for a time, save its context and registers before switching to the thread. When the thread runs for is own amount of time it saves the thread’s registers, then loads the main program’s previously saved registers and continues to where it left off.

what does it mean to save context and registers?

Suppose you are watching a movie Avengers: Infinity War, and you want to take a break right now, but you want to be able to come back and resume watching from the exact point where you stopped. You will record the movie name, exact frame and time in the movie’s timeline you are currently at, so next time you decide to resume you will simply jump to the timeline and continue your movie.

Now, a friend might come and start watching the movie and at some point might become tired and will try to take some break, he will also save the movie time, movie name and frame so when he wants to resume he will simply jump to the time he recorded. You see you share the same movie without blocking each other.

The records you all hold is called a context because that is the info you will use next time you want to resume the movie.

CPU works in a similar way, registers are memory units in the CPU that can hold information.

There are many registers in the CPU:

These registers have purposes and CPU uses them solely to execute the instructions in our program. So now, during execution, these registers are filled with info that shows(and also directs) how the execution has gone so far.

eip holds the current address in the address space to be executed.

esp points to the stack of the address space.

eax, ebx, ecx, edx can hold any info from variable to memory addresses.

So now to save registers means that the OS will retrieve all the values in the register and save them to a table somewhere in the RAM. On subsequent need, the values are retrieved from the table and populated to the registers.

This happens so fast!!! about millions of times a second that it appears to happen all at once to our human sense.

Do you doubt? Imagine a clock with the long hand switching from 12 to 1 rapidly like 2 times a second. We can perceive the time frame, so we can see it actually moving back and forth. Try increasing the rate to 10,000 switches per second, it will become really fast that we can hardly keep up with it. Then, increase the counter to 10 million!!! gosh !! the hand clock will appear to stand still.

Also, a moving car we will perceive the wheel actually turning when the car moves faster the wheel turns very fast. In an overspeeding sports car, the wheels will appear to actually not turning.

You see the switching in our CPU happens very fasts it appears to stand still and running at the same time.

So we see it does not actually run concurrently but runs one after another :)

Heavy computations are moved to the run in a thread to let the main thread run without a hitch.

Web Workers provides an API where we can run a JS code in another thread other than the main thread of script runs in. When we run CPU intensive op in the main thread, we will be greeted by the familiar “unresponsive” script dialog.

<script>
function longOp() {
setTimeout(()=>{},10000000)
}
longOp()
</script>

Nonetheless, with the advancement of the web more operations are becoming highly CPU intensive so Web workers are now our choice to choose to speed up your app.

To create a Worker we just instantiate a Worker instance passing in the name of the script.

const worker = new Worker("script.js")

The script contains the code we want to execute in the Worker. The Worker will load the code in the script.js and execute them in another thread parallel to the main thread.

How do the two threads Main thread and Worker thread communicate with each other?

Threads run in different contexts in the main program address space, they have different resources. To comm with each other, OS implements what we call shared resources, shared resources is a space in the RAM where threads can read and write to send messages to each other. For a thread to send a message to other threads it writes the message to the shared resource, then to read a message a thread just reads from the shared resource.

Web Worker has useful APIs that helps us use web worker to our advantage.

This is used to receive a message posted by the Worker.postMessage(...) method.

const webWorker = new Worker("script.js")
webWorker.onmessage = function(evt) {
const data = evt.data
console.log(data) // 23
}

We instantiate Worker and passed a script.js to it. We set an onmessage function to capture messages from the Worker thread. The onmessage function accepts an evt object, this object contains a data property that contains the message.

// script.js
postMessage(23)

In the script.js file, we post a message from the Worker thread using the postMessage(…) function.

This function is called when the Worker throws an error.

const webWorker = new Worker("script.js")
webWorker.addEventListener("install",(evt) => {
console.log(evt)
})
webWorker.onerror = function(err) {
console.log("Error encountered")
}

This function lets us comm data from the DOM thread to the Worker thread.

// here, we pass a number 23 to the Worker
postMessage(23)
// here, we pass a string to the Worker
postMessage("message")

Internally, the postMessage function uses shared resources to comm. data to the DOM thread, the DOM thread then reads the data from the shared resource. Above, the postMessage(23) writes 23 to the queue, then the DOM thread reads from the queue to get the data.

This functions immediately stops/terminates the Worker even when the Worker has not completed its operations.

const webWorker = new Worker("script.js")
webWorker.addEventListener("install",(evt) => {
console.log(evt)
})
webWorker.terminate()

This function enables us to add an event listener to the Worker.

addEventListener("event", cb)

The first argument is the name of the event we want to listen to. The 2nd argument cb is the callback function that will be executed when the event is triggered.

For example,

const webWorker = new Worker("script.js")
webWorker.addEventListener("install",(evt) => {
console.log(evt)
})

When the install event is fired, the

(evt) => {
console.log(evt)
}

the callback will be executed. Now, let’s look at how the event is fired.

This function is used dispatch event to the Worker. The event registered with addEventListener.

To fire the install event we defined in the above addEventListener section, we will create an Event passing in the name of the event to be fired and call dispatchEvent with the instance of the Event class:

const webWorker = new Worker("script.js")
webWorker.addEventListener("install",(evt) => {
console.log(evt)
})
webWorker.dispatchEvent(new Event('install'))

The event listener subscribed to using addEventListener is unsubscribed using this function removeEventListener.

const webWorker = new Worker("script.js")
webWorker.addEventListener("install",(evt) => {
console.log(evt)
})

To remove the install event listener, we will call the removeEventListener with install:

const webWorker = new Worker("script.js")
webWorker.addEventListener("install",(evt) => {
console.log(evt)
})
webWorker.removeEventListener('install')

Workers become useful when you want to do the following:

  • Encoding/decoding a large string • Complex mathematical calculations (e.g., fibonacci, prime numbers, encryption, simulated annealing, etc.) • Sorting a large array • Network requests and resulting data processing • Calculations and data manipulation on local storage • Prefetching and/or caching data • Code syntax highlighting or other real-time text analysis (e.g., spell checking) • Image manipulation • Analyzing or processing video or audio data (including face and voice recognition) • Background I/O • Polling web services • Processing large arrays or huge JSON responses

Here, we will demo how to use Worker to calculate Fibonacci number of a number.

Fibonacci series is that is characterized by the fact that every number after the first two is the sum of the two preceding ones.

1,1,2,3,5,8,13,21,34,55,89

Each number is the sum of the previous two.

F2 = F1 + F1 => (2 = 1+1)
F5 = F3 + F2 => (5 = 3+2)
Therefore,
Fn = Fn-1 + Fn-2

We can now implement this in JS:

function fibonacci(num) {
if (num 1 || num 2) {
return 1
}
return fibonacci(num-1) + fibonacci(num-2)
}

This function is recursive if the num exceeds 2. It recursively calls itself with a decrement.

The browser thread can simply handle numbers from 2–1000, but bigger numbers >10,000 will cause the browser thread to hang and seem unresponsive. So the best bet is to move it to the Worker thread.

Let’s see how it can be done.

First we setup a Node project:

mkdir workerprj
cd worker
prj
npm init -y

Next install http-server:

npm i http-server

Edit the scripts section of your package.jso to look like this:

"scripts": {
"start": "http-server ./"
}

Create an index.html and fibonacci.js files:

touch index.html
touch fibonacci.js

fibonacci.js will contain our Worker code.

Our design will be like this. Our index.html will have an input box where we will put a number on clicking a button the index.html will post the number to fiboncci.js which is executing in the Worker thread, the fibonacci will calc the fibonacci series of the number and feed the result back to the browser thread index.html.

Let;s begin in the index.html:

<html>
<body>
<div>
<input type="number" id="number" placeholder="Enter any number" />
<button onclick="sendToWorker()">Calc. Fibonacci</button>
</div>
<div id="output"></div>
</body>
<script>
const fibWorker = new Worker("fibonacci.js")
fibWorker.addEventListener('message', function(evt) {
document.getElementById('output').innerHTML = ''
document.getElementById('output').innerHTML = evt.data
})
    function sendToWorker() {
const value = document.getElementById('number').value
fibWorker.postMessage(value)
}
</script>
</html>

It is quite comprehensive. We have an input box with id number and a button which calls the sendToWorker function when clicked. Next, we have a div with id output which will display the result of a fibonacci calc.

In our script section, we initialize a Worker passing the fibaonacci.js script to it. We add an event listener message this will listen for message dispatched using the postMessage API in the worker thread. You see in the callback, we get the reference to output div using document.getElementByID(...) and display the posted fibonacci result in evt.data using innerHTML property.

Next is the sendToWorker function, it first gets the number in the input box and sends it to the Worker thread.

Now, lots look at fibonacci.js file:

function fibonacci(num) {
if (num 1 || num 2) {
return 1
}
return fibonacci(num - 1) + fibonacci(num - 2)
}
self.addEventListener('message', (evt) => {
const num = evt.data
postMessage(fibonacci(num))
})

We define a finonacci function which clacs the fib num of any number passed to it.

Next, we add an event listener to message, this will capture events emitted by the postMessage API from the browser thread. Here, we have a callback function that captures the emitted data in evt.data property, we calc. the fib number of the num and post it to the browser thread.

Now, we see that the main job fibonacci is done in the Worker thread, not in the browser thread. Whenever we press the Calc. Fibonacci button, it moves the calculation to another thread, then waits for when the other thread completes and sends it the result.

To see it in action, run the following in your terminal:

npm run start

Open your fav browser and navigate to 127.0.0.1.

We will use Worker to accomplish prefetching. what is prefetching? This is the loading of a resource in the background so when actually navigated to it simply loads from the cache thus loading very fast.

Prefetching is best done in the Worker thread so as to leave the browser thread with its rendering job.

So we start by making our index.html to look like this:

<html>
<body>
<div>
<button onclick="sendToWorker()">Start Prefetching</button>
</div>
<a href="./test.html">test.html</a>
<a href="./test1.html">test1.html</a>
<a href="./test2.html">test2.html</a>
</body>
<script>
const prefWorker = new Worker("pref.js")
    function sendToWorker() {
prefWorker.postMessage('start')
}
</script>
</html>

The links test.html, test1.html, and test2.html are the files we want to prefetch. So when we click on them they load from cache not from network again.

We have a button Start Prefetching when clicked it posts a message start to the Worker. This will tell the Worker to start prefetching the links above.

Let’s look at our pref.js:

const prefetchURLs = [
"./test.html",
"./test1.html",
"./test2.html"
]
self.onmessage = function(evt) {
const data = evt.data
switch (data) {
case 'start':
prefetchURLs.forEach((url) => {
fetch(url)
})
break;
}
}

We have the links in an array. We set up an onmessage handler which will run when the postMessage is executed in the browser thread. We have a switch statement that executes a block based on what postMessage sent, here we have a case statement for start message. The prefetchURLs is looped through and each url is prefetched using the Fetch API.

how is this prefetching :(? Browsers have a cache which it stores the resources when accessed over a network. When the same resource is requested for the browser first check if the resource is already on its cache if it’s present it serves from the cache, if not it requests it from the network and caches it.

So, clicking on a link or using any of the network request APIs (fetch, XMLHttpRequest, etc) would trigger a network request and the browser would try to cache the returned response of the request so on subsequent requests the cached response is returned or served by the browser.

First populate our files:

<!-- test.html -->
<html>
<b>
Prefetched HTML file
</b>
</html>
<!-- test1.html -->
<html>
<b>
Prefetched HTML file 1
</b>
</html>
<!-- test2.html -->
<html>
<b>
Prefetched HTML file 2
</b>
</html>
- worker/
- test.html
- test1.html
- test2.html
- package.json
- index.html
- pref.js

Now, let’s test our code:

npm run start

Navigate to 127.0.0.1:8080

Now, we see our index.html rendered with links test.html, test1.html, test2.html. Open your DevTools.

Looking at the Network section

We will see the transferred field showing the bytes of the file transferred, this indicates the files were served from the network.

Click on the Start Prefetching button.

We will see that the network tab will be populated with test.html, test1.html, test2.html.

This is so because the fetch API in pref.js requested for them over the network. You will see in the transferred filed that it shows the number of bytes of the file transferred this is so because the file wasn’t served from cache.

Now, lets cut our server.

See our server is stopped.

Now, let’s click on test2.html link

See our test2.html loaded!!! This is so because when we performed the fetch API request on test2.html in our pref.js file, our browser cached the response so now when we clicked on the link again, instead of serving the file from a network it served from the cache because it has already been cached on the first request. See in the transferred field is reads cached meaning it was served from cache.

Let’s click on test1.html

Boom!!! the same thing here.

So we have demonstrated how to prefetch links using workers.

d8 is a shell over the v8. It lets us run JS file in the console just like Node.js.

d8 shell.js

To see the source code of d8, go to https://github.com/v8/v8/src/d8.cc

Web worker is implemented in the d8 shell. We can write a web worker as we do in the browser.

// shell.js
const worker = new Worker('script.js')
worker.onmessage = function(evt) {
print(evt.data) // 34
}
// script.js
postMessage(34)

Same as we did before but we can’t use console.log(...) because it is not available in d8, print is what we use to print to console. So, the script.js is loaded by the d8, it posts 34 data to the Worker thread. The data is captured in shell.js in the onmessage function.

We really exhausted what we know about Worker. We started from multi-threading is to some demos about Workers.

The worker is really an optimization trick everyone should use when developing for the next billion users. If you have any question regarding this or anything I should add, correct or remove, feel free to comment below- I’d be happy to talk. Thanks 😄


Tag cloud