Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » The Art of the PFUG » A Brief Introduction to Computational Algebraic Geometry

Navigation

Table of Contents

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This collection is included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

A Brief Introduction to Computational Algebraic Geometry

Module by: Evan Bullock. E-mail the author

Summary: A brief introduction to algebraic geometry through singularities of plane algebraic curves and Groebner basis computation.

This is a collection of lecture notes and problem sets from MATH 499: VIGRE Computational Algebraic Geometry at Rice University in 2010.

Plane curve singularities

A point pp is a singular point of a plane curve f(x,y)=0f(x,y)=0 if and only if f(p)=0f(p)=0 so that pp is on the curve, and

f p = f x p , f y p = 0 . f p = f x p , f y p = 0 .
(1)

In single variable calculus, for studying points where the first derivative of a function g(x)g(x) is zero, it is often helpful to study the higher derivatives. For example, if g'(a)=0g'(a)=0 and g''(a)>0g''(a)>0, then gg has a local minimum at aa. More generally, if gg is a sufficiently nice function1, then gg is represented near aa by its Taylor series centered at aa,

g ( x ) = g ( a ) + g ' ( a ) ( x - a ) + g ' ' ( a ) 2 ! ( x - a ) 2 + g ( 3 ) ( a ) 3 ! ( x - a ) 3 + ... g ( x ) = g ( a ) + g ' ( a ) ( x - a ) + g ' ' ( a ) 2 ! ( x - a ) 2 + g ( 3 ) ( a ) 3 ! ( x - a ) 3 + ...
(3)

and the behavior of gg very close to aa is completely determined (in a sense that we will make more precise in the future) by the first non-constant term of the Taylor series.

In studying functions of several variables, when f|p=0f|p=0, it also makes sense to look at higher derivatives of ff at pp. In fact, Taylor series work fine in several variables. The idea is the same as it is in the one variable case: we find a polynomial of degree nn in several variables all of whose partial derivatives up to order nn agree with those of the function ff. Letting nn, we get an infinite power series representation in several variables for f(x,y)f(x,y) centered at p=(x0,y0)p=(x0,y0) that looks like:

f ( x , y ) = f ( p ) + f x p ( x - x 0 ) + f y p ( y - y 0 ) ] = f ( p ) + 1 2 ! · 2 f x 2 p ( x - x 0 ) 2 + 2 2 ! · 2 f x y p ( x - x 0 ) ( y - y 0 ) + 1 2 ! · 2 f y 2 p ( y - y 0 ) 2 = f ( p ) + 1 3 ! · 3 f x 3 p ( x - x 0 ) 3 + 3 3 ! · 3 f x 2 y p ( x - x 0 ) 2 ( y - y 0 ) + 3 3 ! · 3 f x y 2 p ( x - x 0 ) ( y - y 0 ) 2 = f ( p ) + 1 3 ! · 3 f x 3 p ( x - x 0 ) 3 + 3 3 ! · 3 f x 2 y p ( x - x 0 ) 2 ( y - y 0 ) + 1 3 ! · 3 f y 3 p ( y - y 0 ) 3 + ... = n = 0 1 n ! k = 0 n n k n f x k y n - k p ( x - x 0 ) k ( y - y 0 ) n - k . f ( x , y ) = f ( p ) + f x p ( x - x 0 ) + f y p ( y - y 0 ) ] = f ( p ) + 1 2 ! · 2 f x 2 p ( x - x 0 ) 2 + 2 2 ! · 2 f x y p ( x - x 0 ) ( y - y 0 ) + 1 2 ! · 2 f y 2 p ( y - y 0 ) 2 = f ( p ) + 1 3 ! · 3 f x 3 p ( x - x 0 ) 3 + 3 3 ! · 3 f x 2 y p ( x - x 0 ) 2 ( y - y 0 ) + 3 3 ! · 3 f x y 2 p ( x - x 0 ) ( y - y 0 ) 2 = f ( p ) + 1 3 ! · 3 f x 3 p ( x - x 0 ) 3 + 3 3 ! · 3 f x 2 y p ( x - x 0 ) 2 ( y - y 0 ) + 1 3 ! · 3 f y 3 p ( y - y 0 ) 3 + ... = n = 0 1 n ! k = 0 n n k n f x k y n - k p ( x - x 0 ) k ( y - y 0 ) n - k .
(4)

If ff is a polynomial to start with, the resulting “Taylor series” will have only finitely many non-zero terms (Why?). For example, if we expand f(x,y)=y2-x3+12x-16f(x,y)=y2-x3+12x-16 around the singular point (2,0)(2,0), we get

f ( x , y ) = y 2 - 6 ( x - 2 ) 2 - ( x - 2 ) 3 . f ( x , y ) = y 2 - 6 ( x - 2 ) 2 - ( x - 2 ) 3 .
(5)

A computer algebra system can compute these Taylor series expansions for us. For example, the Sage command

x,y = var("x y"); taylor(x^2*y + x*y^2, (x,3), (y,-1),10)

produces the output2

y + 1 2 x - 3 + y + 1 x - 3 2 + 3 y + 1 2 + 4 y + 1 x - 3 - x - 3 2 - 5 x + 3 y + 12 y + 1 2 x - 3 + y + 1 x - 3 2 + 3 y + 1 2 + 4 y + 1 x - 3 - x - 3 2 - 5 x + 3 y + 12
(6)

where here we are asking Sage to write x2y+xy2x2y+xy2 as a series in x-3x-3 and y+1y+1. The 10 in the command is telling Sage to only compute the terms of the series up to degree 10, though we know that really we're dealing with a polynomial of degree 3, so higher degree terms in x-3x-3 and y+1y+1 couldn't possibly show up in the Taylor expansion (Why?).

Exercises

For these exercises, I would recommend using the Sage computer algebra system (http://www.sagemath.org/), a free open-source alternative to Maple, Mathematica, etc.; for example, to graph the solution set to x2+y2=2x2+y2=2 in Sage, we might use the commands:

var("x y")
implicit_plot(x^2+y^2-2, (x,-3,3), (y,-3,3)).show(aspect_ratio=1)

To get inline help on a command in Sage, use that command followed by a ?, as in:

implicit_plot?
taylor?
  1. For each of the following curves, calculate the partial derivatives and find the singular points. Then calculate the Taylor expansion for the curve about each singular point. If possible, identify each singularity as a cusp or a node. Finally plot each curve.
    1. x2=x4+y4x2=x4+y4
    2. xy=x6+y6xy=x6+y6
    3. x+y=x3-y3x+y=x3-y3
    4. x3=y2+x4+y4x3=y2+x4+y4
    5. x2y+xy2=x4+y4x2y+xy2=x4+y4
    6. y2=x3+x2y2=x3+x2
    7. (x2+y2)2=x2-y2(x2+y2)2=x2-y2
    8. x3+xy2+1=x+x2+y2x3+xy2+1=x+x2+y2
    9. y2=x5-2x4+x3y2=x5-2x4+x3
  2. Prove that y2=x3+px+qy2=x3+px+q has no singular points if and only if f(x)=x3+px+qf(x)=x3+px+q has three distinct roots. [Hint: First show that a polynomial f(x)f(x) has a multiple root at aa if and only if f(a)=f'(a)=0f(a)=f'(a)=0.]

Multiplicity and the tangent cone

We've now seen plane curves with various singularities and given a few types of them names, although we haven't give any precise mathematical definitions of these yet.

Figure 1: Some singularities we've seen
(a)
Figure 1(a) (nodeexample.png)
(b)
Figure 1(b) (cuspexample.png)
(c)
Figure 1(c) (tacnodeexample.png)

One simple question we should expect to be able to answer using calculus is what are the tangent lines to the branch or branches of the curve at the singular point? (In the node example above, we can see two distinct tangent lines in the picture, whereas in the cusp and tacnode cases, there should only be one tangent line at the singular point.)

To simplify our computations, we may as well first make a change of coordinates so that the singular point is at the origin (we can certainly do this with an affine change of variables–see the homework). Consider then a plane curve V(f)V(f), where ff is a polynomial with f(0,0)=0f(0,0)=0.3

Of course, if the origin were a smooth point, this would be easy: there is only one tangent line, defined by

f x x + f y y = 0 . f x x + f y y = 0 .
(8)

When the origin is singular, both first partial derivatives are zero and we need to look at higher order terms to find the tangent lines. In particular, let us write f=fm+fm+1+...+fnf=fm+fm+1+...+fn where fkfk is the degree kk part of the polynomial ff; here n=degfn=degf and fm0fm0 is the non-zero piece of ff of smallest degree.

Proposition If L=V(ax+by)L=V(ax+by) is a tangent line to V(f)V(f) at p=(0,0)p=(0,0), then ax+byax+by is a factor of fmfm.

Let (x(t),y(t))(x(t),y(t)) for t[0,1)t[0,1) be a local parametrization of a branch of X=V(f)X=V(f) near the origin, with x(0)=y(0)=0x(0)=y(0)=0. We assume for convenience that the branch does not have a vertical tangent, i.e. that y(t)/x(t)y(t)/x(t) is bounded as t0t0. Then the slope of the tangent line to this branch of XX at the origin should be limt0+y(t)x(t)limt0+y(t)x(t). We write

f m ( x , y ) = a 0 x m + a 1 x m - 1 y + + a m - 1 x y m - 1 + a m y m f m ( x , y ) = a 0 x m + a 1 x m - 1 y + + a m - 1 x y m - 1 + a m y m
(9)

so that for (x,y)=(x(t),y(t))(x,y)=(x(t),y(t)) with tt very small,

0 = f ( x , y ) = a 0 x m + a 1 x m - 1 y + + a m - 1 x y m - 1 + a m y m + f m + 1 ( x , y ) + + f n ( x , y ) , 0 = f ( x , y ) = a 0 x m + a 1 x m - 1 y + + a m - 1 x y m - 1 + a m y m + f m + 1 ( x , y ) + + f n ( x , y ) ,
(10)

and dividing through by xmxm, we get

0 = f ( x , y ) x m = a 0 + a 1 y x + + a m - 1 y x m - 1 + a m y x m + f m + 1 ( x , y ) x m + + f n ( x , y ) x m . 0 = f ( x , y ) x m = a 0 + a 1 y x + + a m - 1 y x m - 1 + a m y x m + f m + 1 ( x , y ) x m + + f n ( x , y ) x m .
(11)

Now, since x(t)0x(t)0 and y(t)0y(t)0 as t0+t0+ and y(t)/x(t)y(t)/x(t) stays bounded, limt0+x(t)jy(t)m+i-jx(t)m=0limt0+x(t)jy(t)m+i-jx(t)m=0 for i>0i>0 and we find that

lim t 0 + f m + 1 ( x , y ) x m + + f n ( x , y ) x m = 0 . lim t 0 + f m + 1 ( x , y ) x m + + f n ( x , y ) x m = 0 .
(12)

This implies that

lim t 0 + a 0 + a 1 y ( t ) x ( t ) + + a m - 1 y ( t ) x ( t ) m - 1 + a m y ( t ) x ( t ) m = 0 , lim t 0 + a 0 + a 1 y ( t ) x ( t ) + + a m - 1 y ( t ) x ( t ) m - 1 + a m y ( t ) x ( t ) m = 0 ,
(13)

so that the slope cc of the tangent line is a root of the polynomial a0+a1z++am-1zm-1+amzma0+a1z++am-1zm-1+amzm. Equivalently, z-cz-c is a factor of a0+a1z++am-1zm-1+amzma0+a1z++am-1zm-1+amzm and y-cxy-cx is a factor of fm=a0xm+a1xm-1y++am-1xym-1+amymfm=a0xm+a1xm-1y++am-1xym-1+amym as desired (Why?).

If we had been dealing with a vertical tangent, then we could have divided through by ymym instead, and we would find that

lim t 0 + a 0 x ( t ) y ( t ) m + a 1 x ( t ) y ( t ) m - 1 + + a m - 1 x ( t ) y ( t ) + a m = 0 , lim t 0 + a 0 x ( t ) y ( t ) m + a 1 x ( t ) y ( t ) m - 1 + + a m - 1 x ( t ) y ( t ) + a m = 0 ,
(14)

which implies that 0 must be a root of the polynomial a0wm+a1wm-1+am-1w+ama0wm+a1wm-1+am-1w+am so that xx is a factor of the polynomial fm=a0xm+a1xm-1y++am-1xym-1+amymfm=a0xm+a1xm-1y++am-1xym-1+amym.

It turns out that, at least over the complex numbers, the converse of this proposition is true as well: if ax+byax+by is a factor of fmfm, then near the origin there is a branch of the curve V(f)V(f) having V(ax+by)V(ax+by) as its tangent line at the origin.

We call that smallest degree mm of a non-zero term of ff the multiplicity of the curve X=V(f)X=V(f) at the origin. Since V(fm)V(fm) is often the union of the tangent lines to branches of the curve XX at the origin (and at least always contains those lines) we'll give it a name: we call V(fm)V(fm) the tangent cone of XX at the origin and denote it TC(0,0)(X)TC(0,0)(X).

We said that we could move a singular point of X=V(f)X=V(f) to the origin by a change of coordinates, but we could also have done everything with Taylor expansions centered at any point p=(a,b)p=(a,b): we set fkfk to be the degree kk part of ff regarded as a polynomial in x-ax-a and y-by-b, as can be computed by Taylor expansion

f k = 1 k ! j = 0 k k j n f x j y k - j p ( x - a ) j ( y - b ) k - j . f k = 1 k ! j = 0 k k j n f x j y k - j p ( x - a ) j ( y - b ) k - j .
(15)

We can then define the multiplicity mm as before as the the smallest number so that fm0fm0, and define the tangent cone to XX at pp to be TCp(X)=V(fm)TCp(X)=V(fm). One can check that this agrees with what we'd get by instead translating the curve XX so that pp goes to the origin, computing the tangent cone at the origin as above and then translating back.

One might wonder where the term “cone” is coming from here given that there is no cone in the traditional sense in sight. The answer is that the term “cone” is often used more generally to refer to any locus traced out by lines through some fixed point (the vertex of the “cone”). Here, the “base” of the tangent cone of our plane curve might be regarded as finitely many points, one for each tangent direction. While this can never really look much like a more familiar cone in the case of plane curves, it is possible for a “tangent cone” to a singular point on a surface to actually be a “cone” in the original sense (see Figure 2).

Figure 2: Tangent cone at a surface singularity
(a)
Figure 2(a) (tangentconeex1a.png)
(b)
Figure 2(b) (tangentconeex1b.png)

Exercise Show that if f(x1,...,xn)R[x1,...,xn]f(x1,...,xn)R[x1,...,xn] is a homogeneous polynomial (all of its terms have the same degree), then its zero locus V(f)V(f) is a cone with vertex (0,...,0)(0,...,0), i.e. if (c1,...,cn)V(f)(c1,...,cn)V(f) then (λc1,...,λcn)V(f)(λc1,...,λcn)V(f) for all λRλR.

If we were working over the complex numbers, the converse would be true as well: if V(f)V(f) is a cone with vertex at the origin, then the polynomial ff must be homogeneous. Why doesn't this work over the reals? Can you find a counterexample?

There's another (closely related) notion of “multiplicity,” namely the multiplicity of intersection of a curve and a line. Given a plane curve X=V(f)X=V(f) and a line LL through a point p=(a,b)Xp=(a,b)X, we can define the multiplicity of their intersection as follows: we choose a linear parametrization of LL as α(t)=(a,b)+t(c,d)α(t)=(a,b)+t(c,d) so that α(0)=pα(0)=p. The the composition f(α(t))f(α(t)) is a polynomial in tt with 0 as a root. The intersection multiplicity of XX and LL at pp is defined to be the multiplicity of 0 as a root of f(α(t))f(α(t)), i.e. the power of tt in the factorization of f(α(t))f(α(t)).

Suppose that pp is a smooth point of XX. Then we have

f ( α ( t ) ) = f ( a + t c , b + t d ) = f x p ( a + t c - a ) + f y p ( b + t d - b ) + f 2 ( a + t c , b + t d ) + + f n ( a + t c , b + t d ) = t c f x p + d f y p + t 2 F 2 ( c , d ) + t 3 F 3 ( c , d ) + t n F n ( c , d ) , f ( α ( t ) ) = f ( a + t c , b + t d ) = f x p ( a + t c - a ) + f y p ( b + t d - b ) + f 2 ( a + t c , b + t d ) + + f n ( a + t c , b + t d ) = t c f x p + d f y p + t 2 F 2 ( c , d ) + t 3 F 3 ( c , d ) + t n F n ( c , d ) ,
(16)

where Fk(c,d)=fk(a+c,b+d)Fk(c,d)=fk(a+c,b+d).4 We see then that the intersection multiplicity of XX with LL at the smooth point pp is 1 except when cfxp+dfyp=0cfxp+dfyp=0, i.e. when LL is the tangent line to XX at pp, in which case the intersection multiplicity is at least 2, and we would need to look at the higher order terms to compute it exactly.

The same computation in the case where pp is singular shows that the intersection multiplicity of XX with LL at pp is at least 2 for every line LL through pp.

Exercises

  1. In the previous exercises, you found that the following curves have only one singularity, at p=(0,0)p=(0,0), and calculated the Taylor series expansions at that point. Now, find the multiplicity of each curve at pp and find the tangent cone TCp(X)TCp(X). This should be a matter of interpreting the TaylorÕs series calculations you have already made. Sketch the curves and draw in the tangent cones.
    1. f(x,y)=x4+y4-x2f(x,y)=x4+y4-x2
    2. f(x,y)=x6+y6-xyf(x,y)=x6+y6-xy
    3. f(x,y)=y2+x4+y4-x3f(x,y)=y2+x4+y4-x3
    4. f(x,y)=x4+y4-x2y-xy2f(x,y)=x4+y4-x2y-xy2
    5. f(x,y)=x3+x2-y2f(x,y)=x3+x2-y2
    6. f(x,y)=(x2+y2)2-x2+y2f(x,y)=(x2+y2)2-x2+y2
  2. Use the same methods to find the singularities, the multiplicity at each singularity, and the tangent cones of the following curves. Since these are a bit more complicated, you will probably want to get a computer to do most of the calculations. Sketch a graph of the curve and its tangent cone near each singularity. Depending on what program you use, you may have to be careful of the behavior near singular points. Use your information from the tangent cone to interpret the behavior near singularities.
    1. f(x,y)=2x4-3x2y+y2-2y3+y4f(x,y)=2x4-3x2y+y2-2y3+y4
    2. f(x,y)=2y2(x2+y2)-3y2-x2+1f(x,y)=2y2(x2+y2)-3y2-x2+1
    3. f(x,y)=2y2(x2+y2)-2y2(x+y)-2y2-x2+2x+2yf(x,y)=2y2(x2+y2)-2y2(x+y)-2y2-x2+2x+2y
    4. f(x,y)=(x2+y2)3-4x2y2f(x,y)=(x2+y2)3-4x2y2
  3. One can think of multiplicity as measuring how “bad” a singularity is. We already showed that for a nonsingular point on curve, most lines intersect that point with multiplicity one.
    1. For the curve f(x,y)=x3-y2f(x,y)=x3-y2, show that most lines through the origin meet the curve with multiplicity 2.
    2. For the curve g(x,y)=x4+2xy2+y4g(x,y)=x4+2xy2+y4, show that most lines through the origin meet the curve with multiplicity 33.
  4. We've mentioned that we ought to be able to make a simple change coordinates so that, for example, a singular point is moved to the origin. The basic idea we were hinting at is that of affine equivalence. An affine change of coordinates is a map of the form
    φxy=abcdxy+ef,wheredetabcd0.φxy=abcdxy+ef,wheredetabcd0.
    (17)
    We can think of this as basically just a change of variables (but one which is allowed to distort angles and distances). Two curves f(x,y)f(x,y) and g(x,y)g(x,y) are affine equivalent if they differ by an affine change of coordinates φφ. That is, f(x,y)=g(φ(x,y)f(x,y)=g(φ(x,y). Show that the curves f(x,y)=y2-x3-x2f(x,y)=y2-x3-x2 and g(x,y)=x2-2xy-x-y+14-y3g(x,y)=x2-2xy-x-y+14-y3 are affine equivalent.
  5. Show that multiplicity is invariant under affine equivalence. That is, if φ:C1C2φ:C1C2 is an affine equivalence, it maps a point with multiplicity mm to a point with multiplicity mm.
  6. This problem is a little different, and its connection to plane curves or algebraic geometry may not be apparent for a while.
    1. What natural numbers nn are expressible in the form n=2x+3yn=2x+3y where xx and yy are nonnegative integers? What if we allow xx or yy to be negative?
    2. What natural numbers nn are expressible in the form n=4x+6yn=4x+6y where xx and yy are nonnegative integers? What if we allow xx or yy to be negative?
    3. What natural numbers nn are expressible in the form n=5x+8yn=5x+8y where xx and yy are nonnegative integers? What if we allow xx or yy to be negative?

Rational curves

A rational function in one variable is a function given as a quotient of polynomials

f ( t ) = p ( t ) q ( t ) f ( t ) = p ( t ) q ( t )
(18)

where p(t),q(t)R[t]p(t),q(t)R[t] are polynomials. Note that, despite the name, a rational function isn't a well-defined function at the points where q(t)=0q(t)=0.

Given a curve X=V(f)X=V(f), for fR[x,y]fR[x,y] an irreducible5 polynomial, we say that a rational parametrization of XX is a pair of rational functions x(t),y(t)x(t),y(t) so that

f x ( t ) , y ( t ) = 0 f x ( t ) , y ( t ) = 0
(19)

for all values of tt where it is defined. If XX admits a rational parametrization, we say that XX is a rational curve.

Proposition The circle X=V(x2+y2-1)X=V(x2+y2-1) is a rational curve.

To find a rational parametrization of the circle, we use the fact that a non-vertical line through a point (-1,0)(-1,0) of the circle meets the circle in exactly one other point.

Figure 3
Figure 3 (rationalcircle.png)

We can thus try to use the slope of the line through (-1,0)(-1,0) as a rational parameter for the other point of the circle that the line intersects. The line with slope tt through (-1,0)(-1,0) is defined by y=tx+ty=tx+t. Combining this with the equation x2+y2=1x2+y2=1 for the circle, we get

x 2 + ( t x + t ) 2 = 1 . x 2 + ( t x + t ) 2 = 1 .
(20)

Expanding and grouping terms, we get

( 1 + t 2 ) x 2 + 2 t 2 x + ( t 2 - 1 ) = 0 , ( 1 + t 2 ) x 2 + 2 t 2 x + ( t 2 - 1 ) = 0 ,
(21)

and we see that, as expected, x=-1x=-1 is a solution for all tt. This allows us to factor the equation as

( x + 1 ) ( 1 + t 2 ) x + t 2 - 1 = 0 ( x + 1 ) ( 1 + t 2 ) x + t 2 - 1 = 0
(22)

and we find that the xx-coordinate of the point of intersection other than (-1,0)(-1,0) is

x = 1 - t 2 1 + t 2 , x = 1 - t 2 1 + t 2 ,
(23)

and that

y = t 1 - t 2 1 + t 2 + t = 2 t 1 + t 2 . y = t 1 - t 2 1 + t 2 + t = 2 t 1 + t 2 .
(24)

We've thus found that 1-t21+t2,2t1+t21-t21+t2,2t1+t2 is a rational parametrization of the unit circle (which we could check by substituting back into the implicit equation).

Of course, in the case of the circle this may not immediately seem so exciting, since we already have a very convenient (but non-rational) parametrization by (cost,sint)(cost,sint), but it turns out the rationality (together with the nice integer coefficients) of our parametrization in this example has a fairly interesting corollary:

A Pythagorean triple is a triple of three positive integers (a,b,c)(a,b,c) such that

a 2 + b 2 = c 2 . a 2 + b 2 = c 2 .
(25)

Examples include (3,4,5)(3,4,5), (5,12,13)(5,12,13), and (15,8,17)(15,8,17). Of course, if (a,b,c)(a,b,c) is a Pythagorean triple, then so is (da,db,dc)(da,db,dc), so we may as well restrict our attention to the case where aa, bb, and cc have no common factor. A computation modulo 4 also shows that in this case cc must be odd and exactly one of aa and bb must be even, so we may as well assume bb is even.

The connection to our parametrization of the circle is that if a2+b2=c2a2+b2=c2, then

a c 2 + b c 2 = 1 , a c 2 + b c 2 = 1 ,
(26)

so that ac,bcac,bc is a point with rational coordinates on the curve x2+y2=1x2+y2=1. But our parametrization works just as well over the rational numbers as it does over the real numbers, and we know that (aside from (-1,0)(-1,0)), every point has the form 1-t21+t2,2t1+t21-t21+t2,2t1+t2 where t=mnt=mn is a rational number. This gives

a c , b c = 1 - m n 2 1 + m n 2 , 2 m n 1 + m n 2 = n 2 - m 2 n 2 + m 2 , 2 n m n 2 + m 2 . a c , b c = 1 - m n 2 1 + m n 2 , 2 m n 1 + m n 2 = n 2 - m 2 n 2 + m 2 , 2 n m n 2 + m 2 .
(27)

Thus, since we may take nn and mm to be relatively prime, as long as nn and mm aren't both odd (which would lead to a triple with bb odd instead) we have that a=n2-m2a=n2-m2, b=2nmb=2nm, and c=n2+m2c=n2+m2. This gives a sort of “parametrization” of the Pythagorean triples and in particular makes it easy to show there are infinitely many of them (and not just by multiplying by a constant).

Exercises

  1. Recall that a rational function x(t)x(t) is one of the form x(t)=p(t)q(t)x(t)=p(t)q(t), where pp and qq are polynomials. Show that the following curves are rational by finding non-constant functions x(t)x(t) and y(t)y(t) such that f(x(t),y(t))0f(x(t),y(t))0. Then use a computer to graph the curve from the implicit function and then from the parametrization to verify that they coincide (at least for some section of the curve). Hint: Try using a substitution such as t=yxt=yx or t=yx2t=yx2.
    1. f(x,y)=y2-x3f(x,y)=y2-x3
    2. f(x,y)=x2-y2-(x-2y)(x2+y2)f(x,y)=x2-y2-(x-2y)(x2+y2)
    3. f(x,y)=x5-xy2+y3f(x,y)=x5-xy2+y3
    4. f(x,y)=3x-2y-y2f(x,y)=3x-2y-y2
    5. f(x,y)=x5-x4+x2y-y2f(x,y)=x5-x4+x2y-y2
    6. f(x,y)=x2+2xy+y2-yf(x,y)=x2+2xy+y2-y
    7. f(x,y)=x2-2x-y+1f(x,y)=x2-2x-y+1
  2. A cardioid is defined by the polar equation r=1-cosθr=1-cosθ. Find an implicit polynomial equation f(x,y)=0f(x,y)=0 for the cardioid, and show that
    x(t),y(t)=(cost)(1-cost),(sint)(1-cost)x(t),y(t)=(cost)(1-cost),(sint)(1-cost)
    (28)
    is a (non-rational) parametrization of it.
  3. Recall the definition of an affine equivalence from last week. Show that affine equivalence preserves rationality. That is, show that if f(x,y)=g(φ(x,y))f(x,y)=g(φ(x,y)) for some affine equivalence φφ and V(g)={(x,y):f(x,y)=0}V(g)={(x,y):f(x,y)=0} is rational then V(f)V(f) is also rational.
    1. Show that any nonempty conic is affine equivalent to one with no constant term, i.e. a conic of the form f(x,y)=ax+by+cx2+dxy+ey2f(x,y)=ax+by+cx2+dxy+ey2.
    2. Let f(x,y)=(ax+by)+(cx2+dxy+ey2)=f1+f2f(x,y)=(ax+by)+(cx2+dxy+ey2)=f1+f2 be irreducible, where fifi is the purely degree ii part of the polynomial. Prove that V(f)V(f) is rational.
    3. Show that any irreducible conic is rational.
    4. Now, let f(x,y)f(x,y) by an irreducible degree nn polynomial such that f=fn-1+fnf=fn-1+fn, so that ff has no terms of degree less than n-1n-1. Prove that f(x,y)=0f(x,y)=0 is a rational curve.
  4. On the last homework, we began investigating the solution of equations like n=4x+6yn=4x+6y and n=5x+8yn=5x+8y. We discovered that which numbers are expressible in the form ax+byax+by for x,yx,y integers seems to have a lot to do with the the greatest common divisor of aa and bb. In fact, it turns out that the standard method of computing the g.c.d. dd of aa and bb can help us solve the equation d=ax+byd=ax+by for integers xx and yy. This computational method is called “Euclid's algorithm” and works by repeated division with remainder as follows:
    b=q1a+r1a=q2r1+r2r1=q3r2+r3rn-2=qnrn-1+rnrn-1=qn+1rn+0b=q1a+r1a=q2r1+r2r1=q3r2+r3rn-2=qnrn-1+rnrn-1=qn+1rn+0
    (29)
    The algorithm eventually terminates when it gets a zero remainder (since the remainders get smaller at each step). At that point the g.c.d. of aa and bb is known to be the last non-zero remainder rnrn.
    1. Why does Euclid's algorithm work to find the g.c.d.? [Hint: The common divisors of r1r1 and r2r2 are the same as the common divisors of r2r2 and r3r3. (Why?)]
    2. How does Euclid's algorithm allow us to write the g.c.d. rnrn in the form ax+byax+by? Use it to solve 68x+173y=168x+173y=1.
    3. In the integers modulo 173, what is the multiplicative inverse of 68?

Ideals and monomial orders

Given an algebraic plane curve f(x,y)=0f(x,y)=0, we've been looking at the problem of finding a rational parametrization (x(t),y(t))(x(t),y(t)) for it, where x(t)x(t) and y(t)y(t) are rational functions of tt. In several examples (and in a couple of general cases) we've been able to show that curves are rational and exhibit rational parametrizations.

It's natural to ask the reverse question as well: given a parametric rational curve (x(t),y(t))(x(t),y(t)), can we find a polynomial f(x,y)R[x,y]f(x,y)R[x,y] so that f(x(t),y(t))=0f(x(t),y(t))=0? (One of the homework problems asks you to do this for a non-rational parametrization of the cardioid.)

Proposition Given rational functions x(t)x(t) and y(t)y(t), there exists a polynomial f(x,y)R[x,y]f(x,y)R[x,y] such that f(x(t),y(t))0f(x(t),y(t))0.

We can write the given rational functions as x(t)=a(t)q(t)x(t)=a(t)q(t) and y(t)=b(t)q(t)y(t)=b(t)q(t) for some polynomials a(t),b(t),q(t)R[t]a(t),b(t),q(t)R[t]. For some large degree NN, we'll try to find a polynomial f(x,y)R[x,y]f(x,y)R[x,y] of degree NN so that f(x(t),y(t))0f(x(t),y(t))0, or equivalently so that

q ( t ) N ( f ( x ( t ) , y ( t ) ) 0 . q ( t ) N ( f ( x ( t ) , y ( t ) ) 0 .
(30)

Let n=degq(t)n=degq(t) and m=max{dega(t),degb(t)}m=max{dega(t),degb(t)}. Then q(t)Nf(x(t),y(t))q(t)Nf(x(t),y(t)) is a polynomial in tt of degree at most Nn+NmNn+Nm, whose coefficients are homogeneous linear functions of the coefficients of ff. Thus setting all of its coefficients equal to zero give at most Nn+Nm+1Nn+Nm+1 homogeneous linear equations for the coefficients of ff, and if we can pick NN so that there are at least as many variables as there are equations (i.e. ff has at least Nn+Nm+1Nn+Nm+1 coefficients), then we can solve the system and find an ff with q(t)Nf(x(t),y(t))0q(t)Nf(x(t),y(t))0, and we'll be done.

The degree kk part of ff is

f k ( x , y ) = c k , 0 x k + c k - 1 , 1 x k - 1 y + + c 0 , k y k , f k ( x , y ) = c k , 0 x k + c k - 1 , 1 x k - 1 y + + c 0 , k y k ,
(31)

which has k+1k+1 coefficients. Overall then, ff has 1+2++(N+1)=N(N+1)21+2++(N+1)=N(N+1)2 coefficients, and since this is quadratic in NN and Nn+Nm+1Nn+Nm+1 is linear, for sufficiently large NN we have N(N+1)2Nn+Nm+1N(N+1)2Nn+Nm+1 as desired.

While this proves that it is always possible to find a polynomial vanishing on a rationally parametrized curve, there are a few things that we might not like so much about it. For one thing, NN may be bigger than it has to be; for example, in the parametrization 1-t21+t2,2t1+t21-t21+t2,2t1+t2 for the circle, the proof would use the value N=8N=8 to find a degree 88 polynomial f(x,y)f(x,y) vanishing on the circle, i.e. the solutions to the system would be the f(x,y)=(x2+y2-1)g(x,y)f(x,y)=(x2+y2-1)g(x,y) for arbitrary polynomials g(x,y)g(x,y) of degree 66. This isn't really a big problem though, since we could always try to solve the systems for lower degrees first.

The bigger complaint we might have is that even for small degree examples, this involves solving a big system of linear equations in a big number of variables. Of course, computers are pretty good at solving systems of linear equations, but this certainly isn't something we'd want to do by hand, and even computer algebra systems might have some trouble dealing with huge numbers of variables.

In fact, there is a nicer way to do the computation, and it involves thinking about the problem more geometrically. We can think of the graph of the rational function (x(t),y(t))=a(t)p(t),b(t)q(t)(x(t),y(t))=a(t)p(t),b(t)q(t) in R3=R2×RR3=R2×R as being the common zeros of the two polynomials

g ( x , y , t ) = a ( t ) - x p ( t ) and h ( x , y , t ) = b ( t ) - y q ( t ) g ( x , y , t ) = a ( t ) - x p ( t ) and h ( x , y , t ) = b ( t ) - y q ( t )
(32)

in R[x,y,t]R[x,y,t]. Our goal then is to find a polynomial f(x,y)f(x,y) in only the variables xx and yy so that whenever g(x0,y0,t)=h(x0,y0,t)=0g(x0,y0,t)=h(x0,y0,t)=0 for some value of tt, we have f(x0,y0)=0f(x0,y0)=0. If we could write some polynomial f(x,y)R[x,y]f(x,y)R[x,y] in the form

f ( x , y ) = a ( x , y , t ) f ( x , y , t ) + b ( x , y , t ) g ( x , y , t ) , f ( x , y ) = a ( x , y , t ) f ( x , y , t ) + b ( x , y , t ) g ( x , y , t ) ,
(33)

for polynomials a(x,y,t),b(x,y,t)R[x,y,t]a(x,y,t),b(x,y,t)R[x,y,t], then f(x,y)f(x,y) would certainly have this property, and thus be a polynomial vanishing on the parametrized curve. For example,

x 2 + y 2 - 1 = 1 2 t x y + 1 2 t y - x + 1 2 y 2 - 1 1 - t 2 - ( 1 + t 2 ) x ) - 1 2 t x 2 + t x + 1 2 t + 1 2 x y + 1 2 y 2 t - ( 1 + t 2 ) y . x 2 + y 2 - 1 = 1 2 t x y + 1 2 t y - x + 1 2 y 2 - 1 1 - t 2 - ( 1 + t 2 ) x ) - 1 2 t x 2 + t x + 1 2 t + 1 2 x y + 1 2 y 2 t - ( 1 + t 2 ) y .
(34)

Over the next few weeks, we'll start learning about a computational tool called Gröbner bases that will tell us how to find such an ff, and will more generally allow us to study the question of, given polynomials h1,...hkR[x1,...,xn]h1,...hkR[x1,...,xn], which polynomials fR[x1,...,xn]fR[x1,...,xn] can be written in the form

f = q 1 h 1 + q k h k f = q 1 h 1 + q k h k
(35)

for some q1,...,qkR[x,...,xn]q1,...,qkR[x,...,xn]?

We'll begin with some terminology. If f1,...,fkR[x1,...,xn]f1,...,fkR[x1,...,xn] are polynomials then the ideal generated byf1,...,fkf1,...,fk is the set

f 1 , ... , f k = { q 1 f 1 + + q k f k : q 1 , ... , q k R [ x 1 , ... , x n ] } . f 1 , ... , f k = { q 1 f 1 + + q k f k : q 1 , ... , q k R [ x 1 , ... , x n ] } .
(36)

More generally, a non-empty subset IR[x1,...,xn]IR[x1,...,xn] is defined to be an ideal if g1f1+g2f2Ig1f1+g2f2I for every f1,f2If1,f2I and g1,g2R[x1,...,xn]g1,g2R[x1,...,xn].

Theorem (Hilbert Basis Theorem) Every ideal IR[x1,...,xn]IR[x1,...,xn] is generated by finitely many polynomials, so that

I = f 1 , ... , f k I = f 1 , ... , f k
(37)

for some f1,...,fkIf1,...,fkI.

We probably won't prove this. We should note that this theorem doesn't say anything about the size of the smallest generating set of II, so kk here could be much bigger than nn.

When dealing with polynomials in one variable, a polynomial always has a clear leading term, namely the term of highest degree. For polynomials in several variables, there are many different ways we might want to order the monomials. For convenience, if α=(a1,...,an)α=(a1,...,an) is an nn-tuple of non-negative integers, then we will write

x α = x 1 a 1 x 2 a 2 x n a n x α = x 1 a 1 x 2 a 2 x n a n
(38)

as an abbreviated notation for the corresponding monomial. Although there are many orderings on the monomials to choose from, we want them to respect the algebraic structure. For example if xαxα divides xβxβ, then we would like xαxα to be smaller than xβxβ.

A monomial order for R[x1,...,xn]R[x1,...,xn] is a total order6 on the monomials such that if xα<xβxα<xβ then xγxα<xγxβxγxα<xγxβ for all monomials xγxγ which is a well-ordering7.

Example: Lexicographic order

Probably the simplest monomial ordering is the lexicographic (or “dictionary”) ordering. In this ordering, the power of the first variable is used to determine the order, with powers of the second variable only looked at when the first variable appears to the same power in two monomials. Similarly, we only look at the third variable when the first two are tied, and so on. For example, in the lex order for R[x,y,z]R[x,y,z] with x>y>zx>y>z, we have

x 4 > x 3 y 2 z > x 3 y z 7 > x 3 y z 4 > x 2 y z 5 > x y 3 z 2 > x y > x z 2 > x > y 6 > y 5 z 3 > y z 6 > y > z 3 > 1 . x 4 > x 3 y 2 z > x 3 y z 7 > x 3 y z 4 > x 2 y z 5 > x y 3 z 2 > x y > x z 2 > x > y 6 > y 5 z 3 > y z 6 > y > z 3 > 1 .
(39)

More formally, given two monomials xαxα and xβxβ in R[x1,...,xn]R[x1,...,xn], we say that xα>lexxβxα>lexxβ if in the difference of vectors α-βα-β, the leftmost non-zero entry is positive. One can check that this does in fact define a monomial order.8

Example: Graded lexicographic order

One thing we might not like about lex order is that it doesn't respect degrees (e.g. xy>y3z4xy>y3z4). We can define a new order, called graded lexicographic order by saying that higher degree monomials are bigger and using lex order to break ties. For example,

x 7 > z 7 > x 2 y 2 z 2 > x 2 y z 3 > x y 5 > y 3 z 3 > y z 5 > x 5 > x 4 y > x 3 y 2 > x 3 y z > x 3 z 2 . x 7 > z 7 > x 2 y 2 z 2 > x 2 y z 3 > x y 5 > y 3 z 3 > y z 5 > x 5 > x 4 y > x 3 y 2 > x 3 y z > x 3 z 2 .
(40)

More formally, we say that xα>grlexxβxα>grlexxβ if degxα>degxβdegxα>degxβ or if degxα=degxβdegxα=degxβ and xα>lexxβxα>lexxβ. Since the partial ordering by degree and the lexicographic ordering both have the property that

x α < x β x γ x α < x γ x β for all monomials x γ , x α < x β x γ x α < x γ x β for all monomials x γ ,
(41)

the graded lexicographic order has this property as well. Since any graded order (an order in which degree is used first and then something else is used as a tie-breaker) satisfies well-ordering automatically (because there are only finitely many monomials of each degree), we see that grlex is a term order.

Example: Graded reverse lexicographic order

Perhaps one of the most frequently used term orders in practice (because it tends to result in faster computations) is graded reverse lexicographic or grevlex order. This one is perhaps a little more confusing. As the name suggests, graded reverse lexicographic order uses degree first, and uses “reverse lexicographic order” to break ties.

If we reverse the lexicographic order however, so that xα>revlexxβxα>revlexxβ if xα<lexxβxα<lexxβ, the result isn't a monomial ordering. It fails well-ordering, because there are infinite descending sequences, e.g. in revlex order we have

1 > revlex z > revlex z 2 > revlex z 3 > revlex > revlex y z > revlex y z 2 > revlex y z 3 > revlex . 1 > revlex z > revlex z 2 > revlex z 3 > revlex > revlex y z > revlex y z 2 > revlex y z 3 > revlex .
(42)

However, the reverse of an order preserved under multiplication by xγxγ is at least still preserved under multiplication by xγxγ, so while reverse lexicographic order isn't a monomial order by itself, we can still use it to break ties in a graded order (for which well-ordering is automatic).

There's one final issue to defining grevlex: when we reverse the lex order, it reverses the order of the variables, but we still want to get an order with x1>x2>>xnx1>x2>>xn in the end. Thus we start with a lex order with xn>>x1xn>>x1, so that when we reverse it we get a reverse lexicographic order with x1>revlexx2>revlex>revlexxnx1>revlexx2>revlex>revlexxn. We then say that xα>grevlexxβxα>grevlexxβ if degxα>degxβdegxα>degxβ or if degxα=degxβdegxα=degxβ and xα>revlexxβxα>revlexxβ.

For example, in the graded reverse lexicographic order on R[x,y,z]R[x,y,z] with x>y>zx>y>z, we have

y 2 z 2 > x 3 > x y 2 > x y z > y 2 z > x z 2 > x 2 > x y > y 2 > x z > y z > z 2 > x . y 2 z 2 > x 3 > x y 2 > x y z > y 2 z > x z 2 > x 2 > x y > y 2 > x z > y z > z 2 > x .
(43)

Basically, we order by degree first, and break ties by saying a monomial is bigger if it has a smaller power of the least significant variable. More formally, we say that xα>xβxα>xβ if degxα>degxβdegxα>degxβ or if degxα=degxβdegxα=degxβ and in the vector difference α-βα-β, the rightmost non-zero entry is negative.

Exercises

  1. In each part, determine whether the polynomial fR[x]fR[x] is in the given ideal IR[x]IR[x]. Notice that determining if ff lies in the ideal gg is equivalent to determining if gg divides ff. How do we use the same idea in (c) and (d), where I=g1,g2I=g1,g2?
    1. f(x)=x2-3x+2f(x)=x2-3x+2,        I=x-2I=x-2
    2. f(x)=x5-4x+1f(x)=x5-4x+1,        I=x3-x2+xI=x3-x2+x
    3. f(x)=x2-4x+4f(x)=x2-4x+4,        I=x4-6x2+12x-8,2x3-10x2+16x-8I=x4-6x2+12x-8,2x3-10x2+16x-8
    4. f(x)=x3-1f(x)=x3-1,               I=x9-1,x5+x3-x2-1I=x9-1,x5+x3-x2-1
  2. Find an ideal IR[x]IR[x] in which every element fIfI is divisible by xx, but such that xIxI.
    1. Show that x-y2,xy,y2=x,y2x-y2,xy,y2=x,y2.
    2. Is x-y2,xy=x2,xyx-y2,xy=x2,xy?
  3. Rewrite each of the following polynomials, ordering the terms first with the lex order, then the graded lex order, and finally the graded reverse lex order, provided that x>y>zx>y>z.
    1. f(x,y,z)=2x+3y+z+x2-z2+x3f(x,y,z)=2x+3y+z+x2-z2+x3
    2. 2x2y8-3x5yz4+xyz3-xy42x2y8-3x5yz4+xyz3-xy4
    3. 7x2y4z-2xy6+x2y27x2y4z-2xy6+x2y2
  4. Ideals make sense in the ring of integers ZZ just as they do in polynomial rings like R[x]R[x]. For example, in ZZ the ideal I=a,bI=a,b consists of all integers xa+ybxa+yb for x,yZx,yZ.
    1. Is 10 in the ideal I=3I=3?
    2. Is 2 in the ideal I=5,8I=5,8?
    3. Is -6-6 in the ideal I=12,22I=12,22?
    4. Is 3 in the ideal I=68,173I=68,173?

The division algorithm

We would like to generalize the division algorithm to polynomials in several variables. In the single variable case, it suffices to be able to divide by a single polynomial

g ( x ) = q ( x ) f ( x ) + r ( x ) g ( x ) = q ( x ) f ( x ) + r ( x )
(44)

with degr(x)<degf(x)degr(x)<degf(x), in the sense that even if we want to determine, given g(x),f1(x),f2(x)g(x),f1(x),f2(x), whether it's possible to write

g ( x ) = q 1 ( x ) f 1 ( x ) + q 2 ( x ) f 2 ( x ) g ( x ) = q 1 ( x ) f 1 ( x ) + q 2 ( x ) f 2 ( x )
(45)

we can use Euclid's algorithm to find the gcd f(x)f(x) of f1(x)f1(x) and f2(x)f2(x) and write it as f(x)=a1(x)f1(x)+a2(x)f2(x)f(x)=a1(x)f1(x)+a2(x)f2(x), and then simply divide g(x)g(x) by f(x)f(x).

Another way of saying this is that every ideal f1(x),...,fk(x)R[x]f1(x),...,fk(x)R[x] is in fact generated by a single element f1(x),...,fk(x)=f(x)f1(x),...,fk(x)=f(x), and both computing this element (using Euclid's algorithm) and testing whether g(x)g(x) is divisible by it can be carried out by dividing by single polynomials.

The situation is much more complicated when dealing with polynomials in several variables. For example, the ideal x,yR[x,y]x,yR[x,y] can not be generated by a single element, and while xx and yy have no common factors, it is not possible to write 1=q1(x,y)x+q2(x,y)y1=q1(x,y)x+q2(x,y)y. As such, we won't be able to limit ourselves to dividing by a single polynomial.

Another condition that we'll have to change in the multivariable case is our requirement that degr(x)<degf(x)degr(x)<degf(x). For example, if we tried to write

x 2 y + x y 2 + z 3 = q 1 ( x , y ) x + q 2 ( x , y ) y + r ( x , y ) , x 2 y + x y 2 + z 3 = q 1 ( x , y ) x + q 2 ( x , y ) y + r ( x , y ) ,
(46)

it seems like whatever choices of q1q1 and q2q2 we make we'll always be stuck with a z3z3 term in the remainder, which has a larger degree than the polynomials xx and yy that we're dividing by.

The division algorithm in several variables

We now describe what we can do over R[x1,...,xn]R[x1,...,xn] by essentially just following the usual division algorithm for single variable polynomials. Since the single variable division algorithm involves the leading terms of various polynomials, we fix a monomial order on R[x1,...,xn]R[x1,...,xn] and use it to define LT(f)LT(f) to be the leading term of a polynomial ff according to that monomial order.

Now suppose we are given a polynomial gg that we are trying to divide by polynomials f1,...,fkR[x1,...,xn]f1,...,fkR[x1,...,xn] to write

g = q 1 f 1 + + q n f n + r . g = q 1 f 1 + + q n f n + r .
(47)

If LT(g)LT(g) is divisible by LT(fi)LT(fi), then we can add LT(g)LT(fi)LT(g)LT(fi) to qiqi and subtract LT(g)LT(fi)fiLT(g)LT(fi)fi from gg to cancel out the leading term of gg, leaving it with a a smaller leading term. In this way, we eventually arrive at a polynomial whose leading term is not divisible by any of the LT(fi)LT(fi), and so we must move that leading term to the remainder. We continue in this way: we either cancel out LT(g)LT(g) if it is divisible by some LT(fi)LT(fi) or we just move it to the remainder if it isn't. Since the leading term decreases at each step, this process must terminate eventually with no terms left of gg, and at that point every term of the remainder we're left with will not be divisible by any of the LT(fi)LT(fi).

Theorem (Division algorithm in R[x1,...,xn]R[x1,...,xn]) Fix a monomial order. Let f1,...,fkR[x1,...,xn]f1,...,fkR[x1,...,xn] be given. Then every gR[x1,...,xn]gR[x1,...,xn] can be expressed as

g = q 1 f 1 + + q k f k + r g = q 1 f 1 + + q k f k + r
(48)

with q1,...,qk,rR[x1,...,xn]q1,...,qk,rR[x1,...,xn] and either r=0r=0 or every term of rr is not divisible by the leading term of any of the fifi. Also, the leading monomial of each qifiqifi is no greater than that of ff.

The division algorithm is useful, but it doesn't give us everything that we want by itself. For example, the polynomial y2-y3y2-y3 is in the ideal x-y2,x-y3x-y2,x-y3, but if we try to divide y2-y3y2-y3 by x-y2x-y2 and x-y3x-y3 in lex order, nothing happens: we get q1=q2=0q1=q2=0 and r=y2-y3r=y2-y3, so the division algorithm is not telling us that it is in the ideal.

For now, we'll solve this problem by defining it away. For an ideal IR[x1,...,xn]IR[x1,...,xn], we say that f1,...,fkIf1,...,fkI are a Gröbner basis for II if

L T ( I ) = L T ( f 1 ) , ... , L T ( f k ) L T ( I ) = L T ( f 1 ) , ... , L T ( f k )
(49)

where LT(I)=LT(f):fILT(I)=LT(f):fI is the monomial ideal generated by the leading terms of all the polynomials in II.

In our example above, {x-y2,x-y3}{x-y2,x-y3} is not a Gröbner basis for I=x-y2,x-y3I=x-y2,x-y3 since y2=LT(y2-y3)y2=LT(y2-y3) is in LT(I)LT(I) but not in LT(x-y2),LT(x-y3)LT(x-y2),LT(x-y3). For a Gröbner basis though, the division algorithm does what we want it to.

Proposition Fix a monomial order. Suppose f1,...,fkR[x1,...,xn]f1,...,fkR[x1,...,xn] are a Gröbner basis for the ideal I=f1,...,fnI=f1,...,fn that they generate and suppose gR[x1,...,xn]gR[x1,...,xn]. Let rr be the remainder upon division of gg by (f1,...,fk)(f1,...,fk). Then

  1. g=q1f1++qkfkg=q1f1++qkfk has a solution (q1,,qk)R[x1,...,xn]k(q1,,qk)R[x1,...,xn]k if and only if r=0r=0, and
  2. rr is unique in the sense that if g=f+r˜g=f+r˜ with fIfI and no term of r˜r˜ is divisible by any LT(fi)LT(fi), then r=r˜r=r˜.

In particular, changing the order of the fifi does not change rr.

The first statement is the special case of the second when r=0r=0. To prove the second, we just note that r-r˜Ir-r˜I, so that if rr˜rr˜, then LT(r-r˜)LT(I)LT(r-r˜)LT(I). But every term of r-r˜r-r˜, and in particular the leading term, is not divisible by any of the LT(fi)LT(fi), so that

L T ( r - r ˜ ) L T ( f 1 ) , ... , L T ( f k ) , L T ( r - r ˜ ) L T ( f 1 ) , ... , L T ( f k ) ,
(50)

which contradicts our assumption that f1,...,fkf1,...,fk is a Gröbner basis.

Thus we now “know” how to determine whether a given polynomial gg is in the ideal II: we find a Gröbner basis for II and then just use the division algorithm. At this point though, we've never even shown that any particular set of polynomials is a Gröbner basis (you're asked to show this in a very simple example on the homework). What we really want to be able to do is start with an arbitrary (finite) set of generators for an ideal and find a Gröbner basis for it.9

Exercises

  1. Determine whether x2-4x3+x2-4x-4,x3-x2-4x+4,x3-2x2-x+2x2-4x3+x2-4x-4,x3-x2-4x+4,x3-2x2-x+2.
    1. Compute the remainder on division of the polynomial f=x7y2+x3y2-y+1f=x7y2+x3y2-y+1 by the set {xy2-x,x-y3}{xy2-x,x-y3} with respect to the grlex order on R[x,y]R[x,y] with x>yx>y.
    2. Repeat, using the lex order.
  2. If I=xα(1),...,xα(s)I=xα(1),...,xα(s) is a monomial ideal, prove that a polynomial ff is in II if and only if the remainder of ff on division by xα(1),...,xα(s)xα(1),...,xα(s) is zero.
  3. For the ideal I=2xy2-x,3x2y-y-1I=2xy2-x,3x2y-y-1 with grlex order, show that 2xy2,3x2yLT(I)2xy2,3x2yLT(I).
    1. Show that {x+z,y-z}{x+z,y-z} is a Gröbner basis for lex order.
    2. Divide xyxy by the ordered set (y-z,x+z)(y-z,x+z).
    3. Now divide xyxy by (x+z,y-z)(x+z,y-z). How can your reconcile the different quotients?
  4. Show that {x-y37,x-y38}{x-y37,x-y38} is not a Gröbner basis with respect to lex order.

Buchberger's algorithm for computing Gröbner bases

Given an ideal I=f1,...,fkI=f1,...,fk, we've seen several examples now of how it is possible that f1,...,fkf1,...,fk may not be a Gröbner basis for II, i.e. we may have

L T ( f 1 ) , ... , L T ( f k ) L T ( I ) . L T ( f 1 ) , ... , L T ( f k ) L T ( I ) .
(51)

Essentially, the problem is that it may be possible to cancel out the leading terms of some of the fifi to get new elements of II with smaller leading terms.

Let's look at an example. Consider the ideal I=f1,f2R[x,y,z]I=f1,f2R[x,y,z] in graded lex order, with f1=x2y+y2zf1=x2y+y2z and f2=xy2+z2f2=xy2+z2. We can try to cancel out the leading terms of f1f1 and f2f2 in hopes of getting a polynomial in II with a new leading monomial:

f 3 = y f 1 - x f 2 = y ( x 2 y + y 2 z ) - x ( x y 2 + z 2 ) = y 3 z - x z 2 . f 3 = y f 1 - x f 2 = y ( x 2 y + y 2 z ) - x ( x y 2 + z 2 ) = y 3 z - x z 2 .
(52)

To potentially find more new leading terms of II that aren't in x2y,xy2,y3zx2y,xy2,y3z, we might, for example try to attempt the same sort of cancellation on the leading terms of f1f1 and f3f3, giving

y 2 z f 1 - x 2 f 3 = y 2 z ( x 2 y + y 2 z ) - x 2 ( y 3 z - x z 2 ) = y 4 z 2 + x 3 z 2 , y 2 z f 1 - x 2 f 3 = y 2 z ( x 2 y + y 2 z ) - x 2 ( y 3 z - x z 2 ) = y 4 z 2 + x 3 z 2 ,
(53)

whose leading term is already known to be in LT(I)LT(I) since it is divisible by LT(f3)LT(f3). However, we can divide it by f3f3 (along with f1f1 and f2f2) to possibly get a remainder in II with a new leading term

y 4 z 2 + x 3 z 2 = y z f 3 + ( x 3 z 2 + x y z 3 ) , y 4 z 2 + x 3 z 2 = y z f 3 + ( x 3 z 2 + x y z 3 ) ,
(54)

and we see that f4=x3z2+xyz3If4=x3z2+xyz3I so that x3z2LT(I)x3z2LT(I) is a new leading term which is not divisible by any of x2y,xy2,y3zx2y,xy2,y3z.

What we're doing is looking at pairs of polynomials in our current list of generators, fifi and fjfj and cancelling out their leading terms, and then dividing the result by our current list of generators to potentially get an element of the ideal with a new leading term. It turns out that if we keep doing this until we no longer get anything new in this way out of any pair (fi,fj)(fi,fj) of our current generators, then we can stop this process and our current list of generators is a Gröbner basis.

More precisely, given monomials xαxα and xβxβ, with α=(α1,...,αn)α=(α1,...,αn) and β=(β1,...,βn)β=(β1,...,βn) in R[x1,...,xn]R[x1,...,xn], the least common multiple of xαxα and xβxβ is xγxγ where γi=max{αi,βi}γi=max{αi,βi}. If xγxγ is the LCM of the leading monomials of ff and gg, then we say that the SS-polynomial of ff and gg is the polynomial

S ( f , g ) = x γ L T ( f ) f - x γ L T ( g ) g . S ( f , g ) = x γ L T ( f ) f - x γ L T ( g ) g .
(55)

This is more precisely what we mean above by “cancelling the leading terms” of fifi and fjfj.

Thus to find a Gröbner basis, we claim that all we need to do is start with some generating set of polynomials G={fi}G={fi}, and then keep computing S(fi,fj)S(fi,fj) for pairs of polynomials in our set and dividing the SS-polynomial by the set GG, and adding the remainder to GG if it isn't zero (and hence its leading term isn't in LT(G)LT(G) yet). Eventually this process will stop10 when division of S(fi,fj)S(fi,fj) by the set GG yields a zero remainder for every pair fi,fjGfi,fjG, and once that happens, GG is a Gröbner basis.

This algorithm for computing a Gröbner basis is called Buchberger's Algorithm, and it relies on the following theorem (for a proof, see Cox, Little, and O'Shea, 2.6):

Theorem (Buchberger's criterion) Let G={f1,...fk}R[x1,...,xn]G={f1,...fk}R[x1,...,xn] be a set of polynomials and I=GI=G be the ideal they generate. Then GG is a Gröbner basis for II if and only if for every pair 1i,jk1i,jk, the remainder on division of S(fi,fj)S(fi,fj) by GG is zero.

In our above example, it turns out though that we would have been better off first trying to cancel the leading terms of f2f2 and f3f3: the polynomial

f 5 = y z f 2 - x f 3 = y z ( x y 2 + z 2 ) - x ( y 3 z - x z 2 ) = x 2 z 2 + y z 3 f 5 = y z f 2 - x f 3 = y z ( x y 2 + z 2 ) - x ( y 3 z - x z 2 ) = x 2 z 2 + y z 3
(56)

has a leading term which properly divides that of f4f4, and in fact we see that once we have f5If5I, the polynomial f4f4 is redundant since f4=xf5f4=xf5. We must then check the SS-polynomials of f5f5 with f1f1, f2f2, and f3f3 (this is enough since we've already looked at every pair from f1,f2,f3f1,f2,f3):

S ( f 1 , f 5 ) = z 2 ( x 2 y + y 2 z ) - y ( x 2 z 2 + y z 3 ) = 0 , S ( f 2 , f 5 ) = x z 2 ( x y 2 + z 2 ) - y 2 ( x 2 z 2 + y z 3 ) = - y 3 z 3 + x z 4 = - z 2 f 3 , S ( f 3 , f 5 ) = x 2 z ( y 3 z - x z 2 ) - y 3 ( x 2 z 2 + y z 3 ) = - y 4 z 3 - x 3 z 3 = - y z 2 f 3 - x z f 5 , S ( f 1 , f 5 ) = z 2 ( x 2 y + y 2 z ) - y ( x 2 z 2 + y z 3 ) = 0 , S ( f 2 , f 5 ) = x z 2 ( x y 2 + z 2 ) - y 2 ( x 2 z 2 + y z 3 ) = - y 3 z 3 + x z 4 = - z 2 f 3 , S ( f 3 , f 5 ) = x 2 z ( y 3 z - x z 2 ) - y 3 ( x 2 z 2 + y z 3 ) = - y 4 z 3 - x 3 z 3 = - y z 2 f 3 - x z f 5 ,
(57)

and we see that the remainders are all zero, so {f1,f2,f3,f5}{f1,f2,f3,f5} is a Gröbner basis for II.11

Elimination of variables

We may want to try to find elements of an ideal IR[x1,...,xn]IR[x1,...,xn] which only involve some of the variables xi,...,xnxi,...,xn. For example, to find an implicit equation for the curve given parametrically by a(t)p(t),b(t)q(t)a(t)p(t),b(t)q(t), we would like to find an element of the ideal p(t)x-a(t),q(t)y-b(t)p(t)x-a(t),q(t)y-b(t) that only involves xx and yy and not tt.

It turns out that to do this, all we need to do is find a Gröbner basis for the ideal in Lex order. This is because in Lex order, if the leading term of a polynomial only involves the variables xi,...,xnxi,...,xn, then in fact all of its terms involve only xi,...,xnxi,...,xn. Thus we have

L T ( I R [ x i , ... , x n ] ) = L T ( I ) R [ x i , ... , x n ] L T ( I R [ x i , ... , x n ] ) = L T ( I ) R [ x i , ... , x n ]
(58)

and if GG is a Gröbner basis for II in Lex order, then GR[xi,...,xn]GR[xi,...,xn] is a Gröbner basis (and hence a generating set!) for IR[xi,...,xn]IR[xi,...,xn].12

Thus, to “eliminate” variables from our ideal (i.e. find the polynomials in the ideal which only involve the other variables), we just need to put the variables to be eliminated first in Lex order and find a Gröbner basis for the ideal. This is something very special about Lex order: none of the other orders we've looked at are “elimination orders” in this sense.

Sample Code

Here is a sample session in Macaulay 2 computing a Gröbner basis for I=x2y+y2z,xy2+z2R[x,y,z]I=x2y+y2z,xy2+z2R[x,y,z] in graded lex order:

Macaulay 2, version 1.2
with packages: Elimination, IntegralClosure, LLLBases, PrimaryDecomposition, ReesAlgebra,
               SchurRings, TangentCone
 
i1 : R=QQ[x,y,z,MonomialOrder=>GLex]
 
o1 = R
 
o1 : PolynomialRing
 
i2 : f=x^2*y+y^2*z
 
      2     2
o2 = x y + y z
 
o2 : R
 
i3 : g=x*y^2+z^2
 
        2    2
o3 = x*y  + z
 
o3 : R
 
i4 : I=ideal(f,g)
 
             2     2      2    2
o4 = ideal (x y + y z, x*y  + z )
 
o4 : Ideal of R
 
i5 : gens gb I
 
o5 = | xy2+z2 x2y+y2z y3z-xz2 x2z2+yz3 |
 
             1       4
o5 : Matrix R  <--- R
 
i6 : h=y^4*z^2+x^3*z^2
 
      4 2    3 2
o6 = y z  + x z
 
o6 : R
 
i7 : h % I
 
o7 = 0
 
o7 : R
 
i8 : h // gens gb I
 
o8 = {3} | -xyz |
     {3} | y2z  |
     {4} | 0    |
     {4} | x    |
 
             4       1
o8 : Matrix R  <--- R

Here is some code computing the same Gröbner basis in Singular:

                     SINGULAR                             /
 A Computer Algebra System for Polynomial Computations   /   version 3-1-0
                                                       0<
     by: G.-M. Greuel, G. Pfister, H. Schoenemann        \   Mar 2009
FB Mathematik der Universitaet, D-67653 Kaiserslautern    \
> ring R = 0,(x,y,z),Dp; //Dp = glex, lp=lex, dp=grevlex
> poly f=x^2*y+y^2*z;
> poly g=x*y^2+z^2;
> ideal I=f,g;         // An "ideal" for Singular is just a list of polynomials.
> I;                   // This line is just to display I.
I[1]=x2y+y2z
I[2]=xy2+z2
> ideal gI=groebner(I);
> gI;
gI[1]=xy2+z2
gI[2]=x2y+y2z
gI[3]=y3z-xz2
gI[4]=x2z2+yz3
> poly h=y^4*z^2+x^3*z^2;  // Next we will divide h by the list of polynomials I, which
> reduce(h,I);         // gives a warning since we aren't dividing by a Groebner basis.
// ** I is no standard basis
y4z2+x3z2
> reduce(h,gI);
0

Here is some code computing the same Gröbner basis in Sage:

----------------------------------------------------------------------
| Sage Version 4.5.2, Release Date: 2010-08-05                       |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: R.<x,y,z> = PolynomialRing(QQ,3,order='deglex')  # or degrevlex, lex, etc.
sage: f = x^2*y+y^2*z;
sage: g = x*y^2+z^2;
sage: I = (f,g)*R
sage: I
Ideal (x^2*y + y^2*z, x*y^2 + z^2) of Multivariate Polynomial Ring in x, y, z over Rational Field
sage: gI = I.groebner_basis(); gI
[x^2*z^2 + y*z^3, y^3*z - x*z^2, x^2*y + y^2*z, x*y^2 + z^2]
sage: h = y^4*z^2+x^3*z^2;
sage: h.mod(I)
0
sage: h.mod(gI)  # I'm not sure how to get Sage to just do the division algorithm
0

Exercises

You may wish to use a computer to do most of the work in the following calculations.13

    1. Determine whether of not f=xy3-z2+y5-z3f=xy3-z2+y5-z3 is in the ideal
      I=-x3+y,x2y-z.I=-x3+y,x2y-z.
      (59)
    2. Determine whether or not f=x3z-2y2f=x3z-2y2 is in the ideal
      I=xz-y,xy+2z2,y-z.I=xz-y,xy+2z2,y-z.
      (60)
    1. Find the points on the variety
      Vx2+y2+z2-1,x2+y2+z2-2x,2x-3y-z.Vx2+y2+z2-1,x2+y2+z2-2x,2x-3y-z.
      (61)
    2. Find the points on the variety
      Vx2y-z3,2xy-4z-1,z-y2,x3-4zy.Vx2y-z3,2xy-4z-1,z-y2,x3-4zy.
      (62)
    1. Find an implicit equation for the surface parametrized by:
      x=uty=1-uz=u+t-utx=uty=1-uz=u+t-ut
      (63)
    2. Find an implicit equation for the surface parametrized by:
      x=t+uy=t2+2tuz=t3+3t2ux=t+uy=t2+2tuz=t3+3t2u
      (64)

Reduced Gröbner bases

Last time we found that

{ x 2 y + y 2 z , x y 2 + z 2 , y 3 z - x z 2 , x 3 z 2 + x y z 3 , x 2 z 2 + y z 3 } { x 2 y + y 2 z , x y 2 + z 2 , y 3 z - x z 2 , x 3 z 2 + x y z 3 , x 2 z 2 + y z 3 }
(65)

was a Gröbner basis for I=x2y+y2z,xy2+z2R[x,y,z]I=x2y+y2z,xy2+z2R[x,y,z] in grlex order with x>y>zx>y>z.

Of course, we'd like to be able to say our Gröbner bases are unique. As a first step, we noticed last time that one element of the Gröbner basis was redundant: x3z2+xyz3=x(x2z2+yz)x3z2+xyz3=x(x2z2+yz), so we could remove it and still have a Gröbner basis

{ x 2 y + y 2 z , x y 2 + z 2 , y 3 z - x z 2 , x 2 z 2 + y z 3 } . { x 2 y + y 2 z , x y 2 + z 2 , y 3 z - x z 2 , x 2 z 2 + y z 3 } .
(66)

More generally, if GG is a Gröbner basis with f,gGf,gG and LT(f)LT(f) is a multiple of LT(g)LT(g), then ff will be redundant and can be removed, i.e. the set G-{f}G-{f} is still a Gröbner basis for the same ideal. (Why?)

Of course, multiplying any element of a Gröbner basis by a scalar will give a different Gröbner basis, so if we want uniqueness, we should require that each leading coefficient be 1. Since a monomial ideal certainly has a unique minimal monomial generating set, we might hope that forcing constant leading coefficients and removing redundant elements would be enough to get a Gröbner basis which is unique, but that is not quite the case. The problem is that we could still add a multiple of one generator to another. For example, for any a,bRa,bR

{ x 2 y + a x y 2 + y 2 z + a z 2 , x y 2 + z 2 , y 3 z - x z 2 , x 2 z 2 + b y 3 z + y z 3 - b x z 2 } { x 2 y + a x y 2 + y 2 z + a z 2 , x y 2 + z 2 , y 3 z - x z 2 , x 2 z 2 + b y 3 z + y z 3 - b x z 2 }
(67)

is another Gröbner basis for the same ideal as above. To avoid non-uniqueness arising in this way, we say that a reduced Gröbner basis GG is a Gröbner basis where the leading coefficient of every fGfG is 1 and no term of any fGfG is divisible by the leading term of any gGgG with gfgf.

Starting with a Gröbner basis, we can get a reduced Gröbner basis by multiplying by constants to clear any leading coefficients, throwing away any elements whose leading term is a proper multiple of another leading term, and then replacing each polynomial by the remainder upon dividing it by the rest (to clear out any terms divisible by any of the other leading terms). Moreover, reduced is all we need to impose to make our Gröbner bases unique:

Theorem Fix a term order on R[x1,...,xn]R[x1,...,xn]. Then every ideal IR[x1,...,xn]IR[x1,...,xn] has a unique reduced Gröbner basis.

To prove uniqueness, suppose that GG and G˜G˜ are two different reduced Gröbner bases for II. The set of leading terms of both GG and G˜G˜ must simply by the minimal set of monomial generators of LT(I)LT(I), so if GG'GG', it is because there is some fGfG and f˜G˜f˜G˜ with LT(f)=LT(f˜)LT(f)=LT(f˜) but ff˜ff˜. Then f-f˜If-f˜I, so the remainder of f-f˜f-f˜ upon division by GG is zero, since GG is a Gröbner basis. However, since GG and G˜G˜ are both reduced Gröbner bases, no non-leading term of ff or f˜f˜ is divisible by any leading term in GG. Since the leading terms of ff and f˜f˜ cancel in f-f˜f-f˜, no term of f-f˜f-f˜ is divisible by a leading term of GG, so we see that no actual division occurs, and the remainder is f-f˜f-f˜, so f-f˜=0f-f˜=0, a contradiction.

Exercises

  1. Let IR[x1,...,xn]IR[x1,...,xn] be an ideal.
    1. The llth elimination ideal is defined to be the set
      Il=IR[xl+1,...,xn].Il=IR[xl+1,...,xn].
      (68)
      Prove that IlIl is indeed an ideal of R[xl+1,...,xn]R[xl+1,...,xn].14
    2. Is IlIl an ideal of R[x1,...,xn]R[x1,...,xn]?
    3. Prove that the ideal Il+1R[xl+2,...xn]Il+1R[xl+2,...xn] is the first elimination ideal of IlR[xl+1,...,xn]IlR[xl+1,...,xn], i.e. that (Il)1=Il+1(Il)1=Il+1.
  2. Consider the system of equations
    x2+2y2=3x2+xy+y2=3.x2+2y2=3x2+xy+y2=3.
    (69)
    1. Let I=x2+2y2-3,x2+xy+y2-3I=x2+2y2-3,x2+xy+y2-3. Find Gröbner bases for IQ[x]IQ[x] and IQ[y]IQ[y].
    2. Find all solutions of the system over the complex numbers CC.
    3. Which of the solutions are rational; that is, which solutions like in Q2Q2?
  3. Find all rational solutions (x,y)Q2(x,y)Q2 and all complex solutions (x,y)C2(x,y)C2 of the system
    x2+2y2=2x2+xy+y2=2.x2+2y2=2x2+xy+y2=2.
    (70)
  4. Consider the system of equations
    t2+x2+y2+z2=0t2+2x2-xy-z2=0t+y3-z3=0.t2+x2+y2+z2=0t2+2x2-xy-z2=0t+y3-z3=0.
    (71)
    Suppose we want to eliminate tt. Let
    I=t2+x2+y2+z2,t2+2x2-xy-z2,t+y3-z3R[t,x,y,z]I=t2+x2+y2+z2,t2+2x2-xy-z2,t+y3-z3R[t,x,y,z]
    (72)
    be the corresponding ideal.
    1. Use lex order with t>x>y>zt>x>y>z to compute a Gröbner basis for II, and then find a bass for IR[x,y,z]IR[x,y,z]. You should get four generators, one of which has total degree 12.
    2. Compute a reduced Gröbner basis for the ideal IR[x,y,z]IR[x,y,z] in grevlex order. This time, you should get a set of two generators.

Resultants

We've recently seen how Gröbner bases in lex order allow us to “eliminate” variables; for example, given two plane curves f(x,y)f(x,y) and g(x,y)g(x,y), we expect their intersection to consist of finitely many points (unless they have a common factor), and to find those points, we find a polynomial in yy alone (i.e. eliminate xx) in the ideal f,gf,g whose roots are then the yy-coordinates of the intersection points of ff and gg.

It turns out that, at least in this special case, there was an earlier (19th century) approach to the problem without using Gröbner bases, called resultants. Given two polynomials

f ( t ) = a n t n + a n - 1 t n - 1 + + a 1 t + a 0 , and g ( t ) = b m t m + b m - 1 t m - 1 + + b 1 t + b 0 f ( t ) = a n t n + a n - 1 t n - 1 + + a 1 t + a 0 , and g ( t ) = b m t m + b m - 1 t m - 1 + + b 1 t + b 0
(73)

of degrees nn and mm respectively, we already know how to determine whether ff and gg have a common factor: we simply preform Euclid's algorithm on ff and gg (or, to say the same thing in a rather silly way, we compute a Gröbner basis for the ideal f,gf,g). If the coefficients of ff or gg were changed, however, we would have to start over with Euclid's algorithm, i.e. we can't just preform Euclid's Algorithm on the general polynomials of degree nn and mm. It might be useful then if it were possible to write down (for fixed nn and mm) a polynomial in the coefficients of ff and gg which is zero precisely when ff and gg have a common factor.

In order to do this, we look again at a question we've studied before: given polynomials f(t)f(t) and g(t)g(t) of degrees nn and mm respectively and another polynomial h(t)h(t), what are the solutions to the equation

u f + v g = h u f + v g = h
(74)

for polynomials u(t)u(t) and v(t)v(t)? Let us first assume that ff and gg have no common factors. Then it is possible to solve the equation

u ˜ f + v ˜ g = 1 u ˜ f + v ˜ g = 1
(75)

and hence we may solve the equation for any polynomial hh: certainly (u0,v0)=(hu˜,hv˜)(u0,v0)=(hu˜,hv˜) is a solution. To find all the solutions, we just note that any two solutions must differ from one another by a solution (c(t),d(t))(c(t),d(t)) to the equation

c f + d g = 0 . c f + d g = 0 .
(76)

This equation, however, is much easier to solve: we can rewrite it as cf=-dgcf=-dg and since ff and gg have no factors in common, this means that the solutions to this equation are (c,d)=(qg,-qf)(c,d)=(qg,-qf),15 and that the general solution to the original equation is uq=u0+qguq=u0+qg and vq=v0-qfvq=v0-qf, i.e.

( u 0 + q g ) f + ( v 0 - q f ) g = h ( u 0 + q g ) f + ( v 0 - q f ) g = h
(79)

parametrized by an arbitrary polynomial qq.

If we want to find a solution that minimizes the degree of v=v0-qfv=v0-qf, we simply note that there are unique polynomials qq and rr such that v0=qf+rv0=qf+r and degr<degf=ndegr<degf=n by the division algorithm. Thus we see that there is a unique solution to the original equation in which degv<ndegv<n. Similarly, it can be shown that there is a unique solution in which degu<mdegu<m. Moreover, if degh<n+mdegh<n+m, then these two solutions are the same, since if degv<ndegv<n, then degvg<n+mdegvg<n+m so deguf=deg(h-vg)<n+mdeguf=deg(h-vg)<n+m, and degu<mdegu<m as well.

Proposition Let f,gk[t]f,gk[t] be polynomials over a field kk of degrees n>0n>0 and m>0m>0, respectively. Then for any hk[t]hk[t] of degree less than n+mn+m, there are unique polynomials u,vk[t]u,vk[t] with degu<mdegu<m and degv<ndegv<n such that

u f + v g = h u f + v g = h
(80)

if and only if ff and gg have no common factors.

We've just shown that if ff and gg are relatively prime, then the above equation has a unique solution. We are left with the “only if” part: we must show ff and gg do have a common factor, then either the existence or the uniqueness of the solutions must fail.

In fact, both fail. Existence fails because we can not solve the equation uf+vg=1uf+vg=1, since any common factor of ff and gg must also divide 1. Uniqueness fails even for hh where a solution exists because if dd is a common factor of ff and gg, then gdf+-fdg=0gdf+-fdg=0 is another solution to cf+dg=0cf+dg=0 with degu<mdegu<m and degv<ndegv<n.

Let VkVk be the vector space of polynomials of degree kk or less. Then VkVk has dimension k+1k+1, since 1,t,t2,...,tk1,t,t2,...,tk is basis. Multiplication by ff defines a linear transformation from Vm-1Vm-1 to Vn+m-1Vn+m-1. With respect to these standard bases for Vm-1Vm-1 and Vn+m-1Vn+m-1, the (n+m)×m(n+m)×m matrix

a 0 0 0 0 a 1 a 0 a 2 a 1 0 a 2 a 0 0 a 1 a 0 a 2 a 1 a n - 1 a 2 a n a n - 1 0 a n 0 a n - 1 a n a n - 1 0 0 0 a n a 0 0 0 0 a 1 a 0 a 2 a 1 0 a 2 a 0 0 a 1 a 0 a 2 a 1 a n - 1 a 2 a n a n - 1 0 a n 0 a n - 1 a n a n - 1 0 0 0 a n
(81)

represents multiplication by ff. Similarly, multiplication by gg defines a linear transformation from Vn-1Vn-1 to Vn+m-1Vn+m-1, represented by the (n+m)×n(n+m)×n matrix

b 0 0 0 0 b 1 b 0 0 b 2 b 1 b 0 0 b 2 b 1 0 0 b m - 1 b 2 b 0 0 0 b m b m - 1 b 1 b 0 0 0 b m b m - 1 b 2 b 1 b 0 0 0 b m b 2 b 1 0 0 b m - 1 b 2 0 b m b m - 1 0 b m b m - 1 0 0 0 b m b 0 0 0 0 b 1 b 0 0 b 2 b 1 b 0 0 b 2 b 1 0 0 b m - 1 b 2 b 0 0 0 b m b m - 1 b 1 b 0 0 0 b m b m - 1 b 2 b 1 b 0 0 0 b m b 2 b 1 0 0 b m - 1 b 2 0 b m b m - 1 0 b m b m - 1 0 0 0 b m
(82)

with respect to the standard bases for Vn-1Vn-1 and Vn+m-1Vn+m-1. Now, let Vm-1Vn-1Vm-1Vn-1 be the vector space of pairs of polynomials (u,v)(u,v) where degum-1degum-1 and degvn-1degvn-1. The vector space Vm-1Vn-1Vm-1Vn-1 has dimension m+nm+n, since

( 1 , 0 ) , ( t , 0 ) , ( t 2 , 0 ) , ... , ( t m - 1 , 0 ) , ( t m , 0 ) , ( 0 , 1 ) , ( 0 , t ) , ( 0 , t 2 ) , ... , ( 0 , t n - 1 ) , ( 0 , t n ) ( 1 , 0 ) , ( t , 0 ) , ( t 2 , 0 ) , ... , ( t m - 1 , 0 ) , ( t m , 0 ) , ( 0 , 1 ) , ( 0 , t ) , ( 0 , t 2 ) , ... , ( 0 , t n - 1 ) , ( 0 , t n )
(83)

is a basis; this is essentially the standard basis for Vm-1Vm-1 followed by the standard basis for Vn-1Vn-1. The function T(u,v)=uf+vgT(u,v)=uf+vg defines a linear transformation from Vm-1Vn-1Vm-1Vn-1 to Vn+m-1Vn+m-1 whose matrix with respect to the above basis on Vm-1Vn-1Vm-1Vn-1 and the standard basis on Vn+m-1Vn+m-1 is the (n+m)×(n+m)(n+m)×(n+m) matrix

Syl ( f , g , t ) = a 0 0 0 0 b 0 0 0 0 a 1 a 0 0 b 0 0 a 2 a 1 0 b m - 3 b 0 0 a 3 a 2 a 0 0 b m - 2 b m - 3 0 0 a 3 a 1 a 0 b m - 1 b m - 2 b m - 3 b 0 0 0 a n - 2 a 2 a 1 b m b m - 1 b m - 2 b 0 0 a n - 1 a n - 2 a 3 a 2 0 b m b m - 1 b m - 3 b 0 a n a n - 1 a 3 0 b m b m - 2 b m - 3 0 a n a n - 2 0 0 b m - 1 b m - 2 b m - 3 0 a n - 1 a n - 2 0 0 b m b m - 1 b m - 2 0 a n a n - 1 0 0 0 0 b m b m - 1 0 0 0 a n 0 0 0 0 0 b m Syl ( f , g , t ) = a 0 0 0 0 b 0 0 0 0 a 1 a 0 0 b 0 0 a 2 a 1 0 b m - 3 b 0 0 a 3 a 2 a 0 0 b m - 2 b m - 3 0 0 a 3 a 1 a 0 b m - 1 b m - 2 b m - 3 b 0 0 0 a n - 2 a 2 a 1 b m b m - 1 b m - 2 b 0 0 a n - 1 a n - 2 a 3 a 2 0 b m b m - 1 b m - 3 b 0 a n a n - 1 a 3 0 b m b m - 2 b m - 3 0 a n a n - 2 0 0 b m - 1 b m - 2 b m - 3 0 a n - 1 a n - 2 0 0 b m b m - 1 b m - 2 0 a n a n - 1 0 0 0 0 b m b m - 1 0 0 0 a n 0 0 0 0 0 b m
(84)

which we call the Sylvester matrix. The determinant of the Sylvester matrix we call the resultant:

Res ( f , g , t ) = det ( Syl ( f , g , t ) ) Res ( f , g , t ) = det ( Syl ( f , g , t ) )
(85)

of ff and gg with respect to the variable tt. The resultant is a polynomial in the coefficients of ff and gg with integer coefficients.

The determinant of a matrix is non-zero precisely when the corresponding linear transformation is one-to-one (and equivalently, if and only if the linear transformation is onto). Thus, since we know that solutions to uf+vg=huf+vg=h with uVm-1uVm-1 and vVn-1vVn-1 exist (and are unique) for all hVn+m-1hVn+m-1 precisely when ff and gg have no common factor, we have the following:

Theorem Suppose that f,gk[t]f,gk[t] are polynomials over a field kk of degrees n>0n>0 and m>0m>0 respectively. Then Res(f,g,t)=0Res(f,g,t)=0 if and only if ff and gg have a common factor in k[t]k[t].

One thing we need to be careful of is that this theorem only applies when the degrees of ff and gg are actually nn and mm. If an=bm=0an=bm=0, applying the resultant as if ff and gg had degree nn and mm will always yield zero (Why?) even though ff and gg may not have a common factor.

Remark There's another way to define the resultant. If we assume that f,gC[t]f,gC[t] are monic polynomials, then by the fundamental theorem of algebra, we can factor them over the complex numbers as

f ( t ) = ( t - α 1 ) ( t - α 2 ) ( t - α 3 ) ( t - α n - 1 ) ( t - α n ) , g ( t ) = ( t - β 1 ) ( t - β 2 ) ( t - β m ) . f ( t ) = ( t - α 1 ) ( t - α 2 ) ( t - α 3 ) ( t - α n - 1 ) ( t - α n ) , g ( t ) = ( t - β 1 ) ( t - β 2 ) ( t - β m ) .
(86)

Then if we form the product

R ( f , g , t ) = j = 1 n k = 1 m ( α j - β k ) R ( f , g , t ) = j = 1 n k = 1 m ( α j - β k )
(87)

then it will certainly have the property that R(f,g,t)=0R(f,g,t)=0 if and only if ff and gg have a root (or equivalently, a factor) in common. Also, it's easy to see that permuting the αjαj or permuting the βkβk has no effect on this product. It turns out that this means it's possible to rewrite R(f,g,t)R(f,g,t) as a polynomial in the coefficients of ff and gg,16 and in fact R(f,g,t)=Res(f,g,t)R(f,g,t)=Res(f,g,t). On the other hand, our original definition of the resultant Res(f,g,t)Res(f,g,t) is a polynomial in the coefficients ai,bjai,bj.

Exercises

  1. Compute the resultant of f(x)=x5-3x4-2x3+3x2+7x+6f(x)=x5-3x4-2x3+3x2+7x+6 and g(x)=x4+x2+1g(x)=x4+x2+1. Do these polynomials have a common factor in Q[x]Q[x]?
  2. Let f,gC[x]f,gC[x] be polynomials of degree 3. Verify directly in this case that the following are equivalent:
    1. The polynomials ff and gg have a non-constant common factor.
    2. Res(f,g,x)=0Res(f,g,x)=0.
  3. Resultants give us another method of finding an implicit polynomial equation for a parametric rational curve, in the following way: given a rational curve defined parametrically by
    (x,y)=a(t)p(t),b(t)q(t),(x,y)=a(t)p(t),b(t)q(t),
    (89)
    we can find an implicit equation by calculating the resultant Res(f,g,t)Res(f,g,t) of the polynomials f=a(t)-xp(t)f=a(t)-xp(t) and g=b(t)-yq(t)g=b(t)-yq(t), where we are regarding ff and gg as polynomials in tt with coefficients in C[x,y]C[x,y].
    1. Show that Res(f,g,t)Res(f,g,t), as a polynomial in xx and yy, vanishes on the parametric curve.
    2. Use this method to find an implicit equation for the following curves:
      1. x=t2x=t2,   y=t2(t+1)y=t2(t+1).
      2. x=t-1t2x=t-1t2,   y=t-1y=t-1.
      3. x=t(t2+1)t4+1x=t(t2+1)t4+1,   y=t(t2-1)t4+1y=t(t2-1)t4+1.
    3. Use the Gröbner basis method to find implicit equations for the above parametric curves and check that they define the same curves.
  4. Consider the following polynomials in k[x,y]k[x,y]:
    f=x2y-3xy2+x2-3xyg=x3y+x3-4y2-3y+1f=x2y-3xy2+x2-3xyg=x3y+x3-4y2-3y+1
    (90)
    1. Compute Res(f,g,x)Res(f,g,x).
    2. Compute Res(f,g,y)Res(f,g,y).
    3. What do your answers from (a) and (b) tell you about ff and gg?

Resultants and symmetric polynomials

Last time we claimed (without proof) that the resultant of two monic polynomials could be defined in terms of the roots. Of course, we can do the same with polynomials which aren't monic: if f(t)=antn++a0f(t)=antn++a0 and g(t)=bmtm++b0g(t)=bmtm++b0 are polynomials with complex coefficients, then we may write

f ( t ) = a n ( t - α 1 ) ( t - α 2 ) ( t - α 3 ) ( t - α n - 1 ) ( t - α n ) g ( t ) = b m ( t - β 1 ) ( t - β 2 ) ( t - β m ) f ( t ) = a n ( t - α 1 ) ( t - α 2 ) ( t - α 3 ) ( t - α n - 1 ) ( t - α n ) g ( t ) = b m ( t - β 1 ) ( t - β 2 ) ( t - β m )
(91)

where the αiαi are the roots of ff and the βjβj are the roots of gg, counted with multiplicities. In the Sylvester matrix, replacing ff and gg by the monic polynomials with the same roots corresponds to dividing the first mm columns by anan and dividing the first nn columns by bmbm. Thus the general formula for the resultant in terms of the roots and anan and bmbm should be

R ( f , g , t ) = a n m b m n i = 1 n j = 1 m ( α i - β j ) = a n m i = 1 n g ( α i ) R ( f , g , t ) = a n m b m n i = 1 n j = 1 m ( α i - β j ) = a n m i = 1 n g ( α i )
(92)

Now, consider the case of the discriminant, i.e. take g=f'g=f'. We have

f ' ( t ) = a n k j k ( t - α j ) f ' ( t ) = a n k j k ( t - α j )
(93)

by the product rule. Thus we have

R ( f , f ' , t ) = a n n - 1 i f ' ( α i ) = a n n - 1 i a n k j k ( α i - α j ) = a n 2 n - 1 i k j k ( α i - α j ) = a n 2 n - 1 i j i ( α i - α j ) since if k i then j k ( α i - α j ) = 0 = a n 2 n - 1 i j ( α i - α j ) = a n 2 n - 1 ( - 1 ) n ( n - 1 ) / 2 i < j ( α i - α j ) 2 , by problem 4b on the homework R ( f , f ' , t ) = a n n - 1 i f ' ( α i ) = a n n - 1 i a n k j k ( α i - α j ) = a n 2 n - 1 i k j k ( α i - α j ) = a n 2 n - 1 i j i ( α i - α j ) since if k i then j k ( α i - α j ) = 0 = a n 2 n - 1 i j ( α i - α j ) = a n 2 n - 1 ( - 1 ) n ( n - 1 ) / 2 i < j ( α