We know that any discrete-time signal can be represented by a
summation of scaled and shifted discrete-time impulses, see
(Reference). Since
we are assuming the system to be linear and time-invariant, it
would seem to reason that an input signal comprised of the sum
of scaled and shifted impulses would give rise to an output
comprised of a sum of scaled and shifted impulse responses.
This is exactly what occurs in convolution.
Below we present a more rigorous and mathematical look at the
derivation:

Letting ℋℋ be a discrete time LTI
system, we start with the folowing equation and work our way
down the the convoluation sum.

yn=ℋxn=ℋ∑
k
=−∞∞xkδn−k=∑
k
=−∞∞ℋxkδn−k=∑
k
=−∞∞xkℋδn−k=∑k=−∞∞xkhn−k
y
n
ℋ
x
n
ℋ
k
x
k
δ
n
k
k
ℋ
x
k
δ
n
k
k
x
k
ℋ
δ
n
k
k
x
k
h
n
k

(1)
Let us take a quick look at the steps taken in the above
derivation. After our initial equation we rewrite the function

xn
x
n
as a sum of the function times the unit impulse. Next, we can
move around the ℋ operator and the summation because

ℋ˙
ℋ
˙
is a linear, DT system. Because of this linearity
and the fact that

xk
x
k
is a constant, we pull the constant out and simply multiply it by

ℋ˙
ℋ
˙
. Finally, we use the fact that

ℋ˙
ℋ
˙
is time invariant in order to reach our final state
- the convolution sum!

Above the summation is taken over all integers. Howerer, in many practical
cases either
xn
x
n
or
hn
h
n
or both
are finite, for which case the summations will be limited.
The convolution equations are simple tools which, in principle, can be used
for all input signals. Following is an example to demonstrate convolution;
how it is calculated and how it is interpreted.

A quick graphical example may help in demonstrating why
convolution works.