Introduction to Partial Derivatives

Partial derivatives extend the concept of derivatives to functions of multiple variables. While ordinary derivatives measure how a function changes with respect to one variable, partial derivatives measure how a multivariable function changes when only one variable is varied while others are held constant.

Why Partial Derivatives Matter:

  • Essential for understanding surfaces and functions in higher dimensions
  • Foundation for gradient descent optimization algorithms
  • Critical in physics for describing rates of change in multiple dimensions
  • Key component in machine learning for backpropagation
  • Used in economics for marginal analysis with multiple variables

Real-World Example:

The temperature T(x, y, z, t) at a point (x, y, z) at time t has partial derivatives:

∂T/∂x: Rate of temperature change in x-direction

∂T/∂t: Rate of temperature change over time

Definition & Notation

The partial derivative of a function f(x, y) with respect to x at a point (a, b) is defined as:

∂f/∂x(a, b) = limh→0 [f(a + h, b) - f(a, b)] / h

Similarly, the partial derivative with respect to y is:

∂f/∂y(a, b) = limh→0 [f(a, b + h) - f(a, b)] / h

Common Notations for Partial Derivatives

∂f/∂x
Leibniz notation: Most common, emphasizes which variable is changing
fx
Subscript notation: Compact, commonly used in physics and engineering
xf
Operator notation: Emphasizes the derivative as an operator
D1f
Numerical subscript: Indicates derivative with respect to the first variable
Geometric Interpretation

The partial derivative ∂f/∂x(a, b) represents the slope of the tangent line to the curve formed by intersecting the surface z = f(x, y) with the plane y = b.

Visualization: Partial derivative as slope in x-direction

Imagine slicing the surface with a vertical plane y = constant

The resulting curve's slope at x = a is ∂f/∂x(a, b)

Computation Methods

Computing partial derivatives follows the same rules as ordinary derivatives, with one key difference: treat all other variables as constants.

1️⃣

Basic Rule

To compute ∂f/∂x, treat y as a constant and differentiate with respect to x using ordinary derivative rules.

Example: f(x, y) = x²y + sin(x)

∂f/∂x = 2xy + cos(x)

2️⃣

Product Rule

For f(x, y) = u(x, y)·v(x, y):

∂f/∂x = (∂u/∂x)·v + u·(∂v/∂x)

Example: f(x, y) = x·exy

∂f/∂x = 1·exy + x·y·exy

3️⃣

Quotient Rule

For f(x, y) = u(x, y)/v(x, y):

∂f/∂x = [(∂u/∂x)·v - u·(∂v/∂x)] / v²

Example: f(x, y) = x/(x² + y²)

4️⃣

Chain Rule

For f(x, y) = g(h(x, y)):

∂f/∂x = g'(h(x, y))·(∂h/∂x)

Example: f(x, y) = sin(x² + y)

∂f/∂x = cos(x² + y)·2x

Detailed Example: f(x, y) = x³y² + exy + ln(x² + y)

Compute ∂f/∂x:

1. Differentiate x³y²: ∂/∂x(x³y²) = 3x²y² (treat y² as constant)

2. Differentiate exy: ∂/∂x(exy) = y·exy (chain rule)

3. Differentiate ln(x² + y): ∂/∂x(ln(x² + y)) = 2x/(x² + y)

Result: ∂f/∂x = 3x²y² + y·exy + 2x/(x² + y)

Compute ∂f/∂y:

1. Differentiate x³y²: ∂/∂y(x³y²) = 2x³y (treat x³ as constant)

2. Differentiate exy: ∂/∂y(exy) = x·exy

3. Differentiate ln(x² + y): ∂/∂y(ln(x² + y)) = 1/(x² + y)

Result: ∂f/∂y = 2x³y + x·exy + 1/(x² + y)

Partial Derivative Calculator

Enter a function and click "Compute Partial Derivative"

The Gradient Vector

The gradient of a scalar function f(x, y) is a vector field that points in the direction of the greatest rate of increase of the function.

∇f(x, y) = (∂f/∂x, ∂f/∂y)

For functions of three variables:

∇f(x, y, z) = (∂f/∂x, ∂f/∂y, ∂f/∂z)
📈

Direction of Steepest Ascent

The gradient vector ∇f points in the direction where f increases most rapidly.

Magnitude: ||∇f|| gives the rate of increase in that direction.

📉

Direction of Steepest Descent

-∇f points in the direction where f decreases most rapidly.

This is the basis for gradient descent optimization algorithms.

📐

Level Curves & Surfaces

The gradient is perpendicular to level curves (2D) or level surfaces (3D).

For f(x, y) = c, ∇f is normal to the level curve at each point.

🧭

Directional Derivative

The rate of change of f in direction u (unit vector):

Duf = ∇f·u = ||∇f|| cos θ

Example: f(x, y) = x² + y²

Compute gradient:

∂f/∂x = 2x

∂f/∂y = 2y

∇f(x, y) = (2x, 2y)

At point (1, 2):

∇f(1, 2) = (2, 4)

Direction of steepest ascent: (2, 4) (or normalized: (1/√5, 2/√5))

Rate of increase: ||∇f|| = √(2² + 4²) = √20 ≈ 4.47

Gradient Vector Field Visualization

For f(x, y) = x² + y², gradient vectors point radially outward

Length increases with distance from origin

Gradient Calculator

Enter a function and point, then click "Compute Gradient"

Chain Rule for Partial Derivatives

The chain rule for multivariable functions has several forms depending on how variables depend on each other.

1️⃣

Case 1: z = f(x, y), x = x(t), y = y(t)

dz/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt)

Example: f(x, y) = x²y, x = cos(t), y = sin(t)

dz/dt = 2xy·(-sin(t)) + x²·cos(t)

2️⃣

Case 2: z = f(x, y), x = x(s, t), y = y(s, t)

∂z/∂s = (∂f/∂x)(∂x/∂s) + (∂f/∂y)(∂y/∂s)

∂z/∂t = (∂f/∂x)(∂x/∂t) + (∂f/∂y)(∂y/∂t)

3️⃣

Case 3: Implicit Differentiation

For F(x, y) = 0:

dy/dx = -(∂F/∂x)/(∂F/∂y)

Example: x² + y² = 1

dy/dx = -x/y

💡

Tree Diagram Method

Draw branches from dependent variable to intermediate variables to independent variables.

Sum over all paths from z to t.

Detailed Example: z = x²y, x = s·t, y = s + t

Compute ∂z/∂s:

1. ∂z/∂x = 2xy

2. ∂z/∂y = x²

3. ∂x/∂s = t

4. ∂y/∂s = 1

Chain rule: ∂z/∂s = (∂z/∂x)(∂x/∂s) + (∂z/∂y)(∂y/∂s)

∂z/∂s = (2xy)(t) + (x²)(1) = 2xyt + x²

Substitute x = st, y = s + t: ∂z/∂s = 2(st)(s + t)(t) + (st)²

Final: ∂z/∂s = 2s²t² + 2st³ + s²t² = 3s²t² + 2st³

Chain Rule Practice

Enter functions and click "Compute"

Higher-Order Partial Derivatives

Just as with single-variable functions, we can take partial derivatives of partial derivatives.

Notation for Second Partial Derivatives

∂²f/∂x²
Second partial with respect to x: Differentiate f twice with respect to x
∂²f/∂y∂x
Mixed partial: First with respect to x, then with respect to y
fxx
Subscript notation: f differentiated twice with respect to x
fxy
Mixed partial: First x, then y

Clairaut's Theorem (Equality of Mixed Partials):

If fxy and fyx are continuous at a point, then they are equal at that point:

∂²f/∂x∂y = ∂²f/∂y∂x

Example: f(x, y) = x³y² + sin(xy)

First partials:

fx = 3x²y² + y·cos(xy)

fy = 2x³y + x·cos(xy)

Second partials:

fxx = ∂/∂x(3x²y² + y·cos(xy)) = 6xy² - y²·sin(xy)

fyy = ∂/∂y(2x³y + x·cos(xy)) = 2x³ - x²·sin(xy)

fxy = ∂/∂y(3x²y² + y·cos(xy)) = 6x²y + cos(xy) - xy·sin(xy)

fyx = ∂/∂x(2x³y + x·cos(xy)) = 6x²y + cos(xy) - xy·sin(xy)

Note: fxy = fyx as expected from Clairaut's theorem

Higher-Order Derivatives Calculator

Enter a function and click "Compute Second Partials"

Applications of Partial Derivatives

Partial derivatives have wide-ranging applications across science, engineering, and economics.

🌡️

Physics: Heat Equation

The heat equation describes temperature distribution:

∂u/∂t = α(∂²u/∂x² + ∂²u/∂y² + ∂²u/∂z²)

where u(x, y, z, t) is temperature and α is thermal diffusivity.

🌊

Fluid Dynamics

Navier-Stokes equations for fluid flow:

ρ(∂v/∂t + v·∇v) = -∇p + μ∇²v + f

where v is velocity, p is pressure, ρ is density, μ is viscosity.

💰

Economics: Marginal Analysis

For production function Q(K, L):

∂Q/∂K: Marginal product of capital

∂Q/∂L: Marginal product of labor

Used to optimize production with limited resources.

📡

Signal Processing

Image processing uses partial derivatives for edge detection:

Sobel operator: G = √[(∂I/∂x)² + (∂I/∂y)²]

where I(x, y) is image intensity.

Economics Application: Cobb-Douglas Production Function

Production function: Q(K, L) = A·Kα·Lβ

where K = capital, L = labor, A = total factor productivity

Marginal products:

MPK = ∂Q/∂K = α·A·Kα-1·Lβ

MPL = ∂Q/∂L = β·A·Kα·Lβ-1

Interpretation:

MPK tells us how much additional output we get from one more unit of capital

MPL tells us how much additional output we get from one more unit of labor

Optimization with Partial Derivatives

Partial derivatives are essential for finding maximum and minimum values of multivariable functions.

📍

Critical Points

A point (a, b) is critical if:

∂f/∂x(a, b) = 0 and ∂f/∂y(a, b) = 0

or if either partial derivative does not exist.

📊

Second Derivative Test

Let D = fxxfyy - (fxy

• D > 0, fxx > 0: Local minimum

• D > 0, fxx < 0: Local maximum

• D < 0: Saddle point

• D = 0: Test inconclusive

⚖️

Lagrange Multipliers

For constrained optimization: maximize f(x, y) subject to g(x, y) = k

Solve: ∇f = λ∇g, g(x, y) = k

where λ is the Lagrange multiplier.

📉

Gradient Descent

Iterative optimization algorithm:

xn+1 = xn - γ∇f(xn)

where γ is the learning rate.

Example: Find and classify critical points of f(x, y) = x³ + y³ - 3xy

Step 1: Find critical points

fx = 3x² - 3y = 0 → y = x²

fy = 3y² - 3x = 0 → x = y²

Substitute y = x² into x = y²: x = (x²)² = x⁴

x⁴ - x = x(x³ - 1) = 0 → x = 0 or x = 1

Critical points: (0, 0) and (1, 1)

Step 2: Second derivative test

fxx = 6x, fyy = 6y, fxy = -3

D = fxxfyy - (fxy)² = 36xy - 9

At (0, 0): D = 0·0 - 9 = -9 < 0 → Saddle point

At (1, 1): D = 36·1·1 - 9 = 27 > 0, fxx = 6 > 0 → Local minimum

Optimization Calculator

Enter a function and click "Find Critical Points"

Partial Derivatives in Machine Learning

Partial derivatives are fundamental to training neural networks through backpropagation.

🧠

Gradient Descent for Neural Networks

For loss function L(θ) where θ = (θ₁, θ₂, ..., θₙ):

θi(t+1) = θi(t) - η·∂L/∂θi

where η is the learning rate.

🔙

Backpropagation

Chain rule applied to compute gradients through network layers:

∂L/∂wij = (∂L/∂aj)(∂aj/∂zj)(∂zj/∂wij)

where aj is activation, zj is weighted sum.

📈

Stochastic Gradient Descent

Uses gradient computed on random mini-batches:

θ ← θ - η∇LB(θ)

where LB is loss on mini-batch B.

Faster convergence for large datasets.

Optimizers

Advanced gradient-based optimizers:

• Adam: Adaptive moment estimation

• RMSprop: Root mean square propagation

• Adagrad: Adaptive gradient algorithm

Backpropagation Example: Simple Neural Network

Network: Single neuron with sigmoid activation

z = w·x + b

a = σ(z) = 1/(1 + e-z)

L = ½(y - a)² (mean squared error)

Gradients for backpropagation:

∂L/∂a = -(y - a)

∂a/∂z = σ(z)(1 - σ(z)) = a(1 - a)

∂z/∂w = x

∂z/∂b = 1

Chain rule:

∂L/∂w = (∂L/∂a)(∂a/∂z)(∂z/∂w) = -(y - a)·a(1 - a)·x

∂L/∂b = (∂L/∂a)(∂a/∂z)(∂z/∂b) = -(y - a)·a(1 - a)

Interactive Tools & Visualizations

Partial Derivatives Visualization Tool

Explore how partial derivatives relate to surfaces and contour plots.

1.0
1.0

Surface Plot

Function: f(x, y) = x² + y²

Point: (1.0, 1.0)

∂f/∂x = 2x = 2.0

∂f/∂y = 2y = 2.0

∇f = (2.0, 2.0)

Interpretation

The gradient vector (∂f/∂x, ∂f/∂y) points in the direction of steepest ascent.

At (1, 1) for f(x, y) = x² + y², the surface increases most rapidly in the direction (1, 1).

Directional Derivative Calculator

Enter function, point, and direction vector

Practice Problems & Solutions

Problem 1: Compute all first partial derivatives of f(x, y) = x³ey + y·ln(x)

Solution:

∂f/∂x = ∂/∂x(x³ey) + ∂/∂x(y·ln(x)) = 3x²ey + y/x

∂f/∂y = ∂/∂y(x³ey) + ∂/∂y(y·ln(x)) = x³ey + ln(x)

Problem 2: Find the gradient of f(x, y, z) = x² + yz + sin(xz) at point (1, 2, 0)

Solution:

∂f/∂x = 2x + z·cos(xz)

∂f/∂y = z

∂f/∂z = y + x·cos(xz)

At (1, 2, 0):

∂f/∂x = 2·1 + 0·cos(0) = 2

∂f/∂y = 0

∂f/∂z = 2 + 1·cos(0) = 2 + 1 = 3

∇f(1, 2, 0) = (2, 0, 3)

Problem 3: Use the chain rule to find ∂z/∂t when z = x²y, x = s² + t, y = s - t²

Solution:

∂z/∂t = (∂z/∂x)(∂x/∂t) + (∂z/∂y)(∂y/∂t)

∂z/∂x = 2xy, ∂z/∂y = x²

∂x/∂t = 1, ∂y/∂t = -2t

∂z/∂t = (2xy)(1) + (x²)(-2t) = 2xy - 2tx²

Substitute x = s² + t, y = s - t²:

∂z/∂t = 2(s² + t)(s - t²) - 2t(s² + t)²

Problem 4: Find and classify all critical points of f(x, y) = x³ + y³ - 3x - 3y

Solution:

∂f/∂x = 3x² - 3 = 0 → x² = 1 → x = ±1

∂f/∂y = 3y² - 3 = 0 → y² = 1 → y = ±1

Critical points: (1, 1), (1, -1), (-1, 1), (-1, -1)

Second derivatives: fxx = 6x, fyy = 6y, fxy = 0

D = fxxfyy - (fxy)² = 36xy

• (1, 1): D = 36 > 0, fxx = 6 > 0 → Local minimum

• (1, -1): D = -36 < 0 → Saddle point

• (-1, 1): D = -36 < 0 → Saddle point

• (-1, -1): D = 36 > 0, fxx = -6 < 0 → Local maximum

Challenge Problem

Find the maximum value of f(x, y) = xy subject to the constraint x² + y² = 1 using Lagrange multipliers.

Solution using Lagrange multipliers:

Constraint: g(x, y) = x² + y² = 1

Lagrangian: L(x, y, λ) = xy - λ(x² + y² - 1)

∂L/∂x = y - 2λx = 0 → y = 2λx

∂L/∂y = x - 2λy = 0 → x = 2λy

∂L/∂λ = -(x² + y² - 1) = 0 → x² + y² = 1

From first two equations: y/x = x/y → y² = x² → y = ±x

Substitute into constraint: x² + x² = 1 → 2x² = 1 → x = ±1/√2

Critical points: (1/√2, 1/√2), (1/√2, -1/√2), (-1/√2, 1/√2), (-1/√2, -1/√2)

f values: f(±1/√2, ±1/√2) = 1/2, f(±1/√2, ∓1/√2) = -1/2

Maximum value: 1/2 at points (1/√2, 1/√2) and (-1/√2, -1/√2)