Listen to Your Images: Unveiling the Voices of Images

Thursday, 21 March 2024

numpy python scikit-learn scipy sonification

In the realm of data representation, visualization is the de facto standard tool. Graphs have long been the go-to method for exploring raw data and extracting information. It is always possible to convert data into some forms that are suitable for being displayed. But what if we change our data into a form that can be heard?

What is Sonification?

Sonification is the process of presenting data as sounds. In image data sonification, visual information contained within an image, such as colors, shapes, and spatial organization, is converted into audible representations. By assigning different acoustic properties to various image components, we can unlock a new mode of understanding and exploring visual datasets.

What are we going to do?

In this post, we are going to convert our JPG images into WAV audio files. It will be a tiny fun project. Each produced WAV file will be a series of noises rather than a piece of ear-catching music. The programming language is Python 3, and NumPy, scikit-image, SciPy, and progress are needed.

A WAV file is a simple uncompressed digital audio file. For the sake of simplicity, think of a WAV file as a list of sample points that represent the actual analog audio waveform. The more samples captured per analog audio second, the more accurate digital audio will be generated. Each sample is a floating-point number in the range of [-1, +1].

To create samples, we are going to scan all the pixels from the loaded JPG image. Each pixel stores RGB value to represent color channels. Color channel values are in the range of [0, 255]. For each pixel, we can create three audio samples but we have to map values from the range [0, 255] to [-1, +1].

Let start ...

The following formula will do the mapping:

span = (1-(-1)) / 255
sound_value = -1 + color_value * span

To start, let's import some packages and define a couple of values:

import numpy as np
import skimage.io
from scipy.io import wavfile
from progress.bar import Bar

__SPAN = 0.00784313725490196  # (1-(-1))/255
__SAMPLE_RATE = 44100

and here is the skeleton of our function:

def encodeImage(imageFile: str, outputFile: str):
    pass

The first thing we do in the function is to read the image data and get the number of rows and the number of columns it has. We use scikit-image to read the image file that returns a multi-dimensional NumPy array:

img = skimage.io.imread(imageFile)
rows = img.shape[0]
cols = img.shape[1]

Then a new NumPy array is needed to store the final audio data. As we are going to create three audio samples per pixel, the length of this array should be three times the number of pixels:

wave = np.zeros(((rows*cols*3),), dtype=np.float32)

and the main loop which iterates all the pixels, extracts RGB values, maps RGB values to the appropriate range, and stores audio values in the final audio array while displaying a progress bar to inform the user about the progress:

bar = Bar("Encoding", max=rows)
index = 0
for row in range(rows):
    for col in range(cols):
        (r, g, b) = img[row][col]
        wave[index] = -1.0 + r*__SPAN
        wave[index+1] = -1.0 + g*__SPAN
        wave[index+2] = -1.0 + b*__SPAN
        index += 3
    bar.next()
bar.finish()

And finally, we use SciPy to save the final file:

wavfile.write(outputFile, __SAMPLE_RATE, wave)

Now you can call the encodeImage function with proper data and generate your WAV files.

The complete source code can be found on GitHub.