Star Catalog

Make A New Bright Star Catalog

The source code for this project is here.

This project is about creating a custom star catalog out of existing star catalogs, using the Java programming language. It grew out of a desire to create a replacement for the Yale Bright Star Catalog, revision 5 (1991).

The main result of this project is a bright star catalog having these characteristics:

it's based on Hipparcos-2 astrometry
positions use the ICRS, with the epoch J1991.25 used for proper motion (same as Hipparcos)
full kinematics (position and velocity) are included for all stars, along with related formal errors. (Exception: pi-1 Gru has no radial velocity.)
it includes 5,112 stars with a limiting Vmag <= 6.0
it includes 68 variable stars whose variability range straddles the limiting magnitude
each record explicitly states the provenance of every field in the record
it includes the Bayer, Flamsteed, HR, and HD identifiers when available
it states the constellation for each star (with proper motion epoch of J1991.25, to be precise)
it includes a proper name for 109 stars

For almost all purposes, the ICRS can be taken as identical to J2000. Note that the epoch of proper motion J1991.25 differs from the epoch of J2000. To produce a postion as 2022, for example, you need to first apply proper motion from J1991.25 to 2022, to get the corresponding J2000 position. Then you would apply precession from J2000 to 2022, to get the final result.

There is a related project defining stick figures for constellations, as an aid to creating star charts.

Download Catalogs

In order to run the code in this project, you need to tell the program where to find catalog files. You do that by setting an argument called project-root on the command line, which points to the root of this project on your system:

-Dproject-root=C:\blah\star-catalog

You also need to download the catalog files from VizieR. They aren't included in this project. For details, see the catalogs/input directory. Each catalog has a SOURCE.utf8 file with a link to the VizieR catalog, and the name of the missing file that you need to download.

Sources of Data

Almost all data is taken from the Centre de Données astronomiques de Strasbourg (CDS), in France. Among the tools at the CDS are:

VizieR - a compendium of catalogs used by professional astronomers.
SIMBAD - a cross-reference tool, mapping from specific objects to various catalogs and underlying papers and bibilographies. Description of SIMBAD (as of 2000).

From the SIMBAD docs: "Simbad is not a catalogue, and should not be used as a catalogue. The CDS also provides the VizieR database which contains published lists of objects, as well as most very large surveys. The idea now is to use both Simbad and VizieR as complementary research tools." (It seems like this advice is not strictly followed by many people.)

"It is to be noted that for a double system in which the components can be observed separately, Simbad frequently includes three entries: A and B components, and an additional entry for the joint system (AB), the latter entry carrying the observational data and references related to the system as a whole."

Recurring Issues With Catalogs

Bandpass Conversion Issue

Take the Hipparcos mission as an example. Its detectors have certain custom bandpasses - Hp for Hipparcos, Bt and Vt for the Tycho instrument on board the same spacecraft. Those detectors are very sensitive, and can measure millimags. These bandpasses aren't the same as the UBV system (also known as Johnson, or Johnson-Morgan, which dates from the 1950s). Conversion formulas can be derived, but you need to understand that they are only approximate, and never quite as precise as the inputs. The loss of precision will vary according to the formula used. Here's an example which uses cubic fits, and assumes that ground-based colors are available. You may lose a decimal place with such a conversion.

So the Hipparcos catalog lists the magnitude in both its native bandpasses Hp, Bt and Vt, but it also uses a derived, calculated Vmag field, converted to the Johnson V magnitude. Vmag is to 2 decimal places, while Hp is to 4 decimal places. The values for the native bandpasses are much more precise. Casual users of the catalog can be completely unaware of the significant decrease in precision for such calculated fields.

Useful to note: "It is usually the colours of an object that are of astrophysical interest rather than the observed magnitudes themselves (ref)."

Magnitude Cutoff Issues

The above bandpass conversion issue leads to problems. Catalogs often have an explicit cutoff magnitude. If that cutoff is expressed in Johnson V, and that V is an approximate calculated field, then the cutoff will also be approximate.

Double/multiple stars also have an issue with the cutoff. If you intend to amalgamate close doubles, then there is cross-talk with the filtering by magnitude. The magnitude of each component in the system can be below the brightness limit, but the combined brightness may be above the limit. There is a formula for adding the magnitudes of two close stars.

Variable stars also have magnitude cutoff issues. For variable stars, you need to know its magnitude at maximum brightness in order to filter correctly with respect to a given magnitude limit. Take for example the long-period variable Mira. In Hipparcos, Mira's Vmag is stated as 6.47 (near the middle of its range). But at its peak, Mira's magnitude is around 3.4. So if you filter Hipparcos' hip_main.dat file using Vmag < 6.0, the result will not include Mira.

Multiplicity Issues

Dealing with double/multiple stars is always an issue. A catalog may have one entry (row in the catalog) for the system, or it may decide to have an entry for each component. Sometimes this breakdown can have a lot of logic (involving the separation of components and other items), which can be hard to follow. For example, see the documentation for the Hipparcos catalog.

When a system is treated as being a single entry, then the issue arises of how to characterise the brightness. There is a formula for computing the magnitude of a double star from the magnitude of its components.

The radial velocity should be explicitly attached to either a component of the system, or to its barycenter, otherwise you don't know what the radial velocity refers to.

Cross-match Issues

When you attempt to amalgamate data across two catalogs, you need a way to cross-match records. That is preferably done using an identifier that they have in common.

When you try to link one catalog to another, it's extremely likely that the join will not match 100% of the time. When you join catalog A to catalog B via a common identifier, then a non-nullable field in B usually becomes a nullable field in the result.

Strictly speaking, an identifier maps to a record in a catalog, not to a star as such. Because of the multiplicity issue, different catalogs can have different viewpoints on which part (or parts) of a multiple system to include in a record.

Data Amalgamation and Provenance

When you gather data from multiple sources, it's beneficial to keep track of the source or provenance of each piece of data.

One method of doing this might be to define a conventional ordered string of text, that matches the order of fields in the catalog. The text would use codes that identify the provenance of each field, in sequence. (This technique is used in this project.)

Yale Bright Star Catalog (BSC)

VizieR link to the most recent revision (r5)
it was originally based on Harvard Revised photometry, 1879-1906.
the 4th revision had a supplement published. See its intro.doc for the reasons for its creation.
9110 rows, and 2603 rows in the supplement

Issues:

it contains 14 unusual objects that really shouldn't be there
in the past, there has been a policy of only allowing objects that were in previous versions of the catalog, with no new objects allowed. This policy seems undesirable.
it has over 600 items with Vmag greater than its stated cutoff of 6.5. As explained in the intro.doc of the supplement, this is because of bandpass issues, that is of differences between Johnson V and the Harvard Revised (HR) photometry (1879-1906). See chart below.
its positions are only to the nearest arcsecond. Modern catalogs have positions to milliarcsecond or better.
most parallax values are missing
the documentation for the latest revision 5 states that it is a preliminary release, but no final release was ever made.
the BSC revision 4 has a supplement published, but that supplement was not released for revision 5.
its Vmag field has data from 3 different sources. Usually it's V (in the UBV system), but sometimes it's Harvard Revised photometry (HR), or even a calculated V using HR as source.
it lacks uncertainties for the core astrometric quantities.

The 5th edition was never published in book form. Here are some quotes from the printed 4th edition (1982):

"The ostensible visual magnitude limit of the HR catalogue was 6.50, and it contains 9110 objects. Even in 1908, but especially by modern standards, that magnitude limit is hazy at best. It included 695 HR stars with magnitudes fainter than 6.5V on the UBV photometric system; whereas over 200 stars of magnitude 6.00-6.50V in the modern compilations...are not included in the HR."

The mapping between Harvard Revised Photometry (HR) v and Johnson V magnitudes is shown graphically in the supplement to the 4th edition:

Johnson versus HR magnitudes

"Inevitably many gaps still remain in the tabulated materials.... Despite the vast accumulation of new data it is perhaps astonishing, and somewhat depressing, to note the large numbers of omissions of photoelectric magnitudes and colors, MK spectral classes, and radial velocities among the stars that have been known the longest and are presumably the easiest to observe."

"Since the third edition was published, based on the literature through 1962, it is estimated that well over 200,000 astronomical papers have been listed in the Jahresbericht and its successor, the Astronomy and Astrophysics Abstracts.... Probably well over a third of these references deal in one way or another with stars. The titles and abstracts for the years 1961 through 1979 were scanned for potentially useful data both for the REMARKS and for filling gaps in the tabulated data. Very few of the titles or abstracts, however, indicate whether or not any bright stars are involved. Hence many an interesting item may have been overlooked."

"Considering the immense amount of material scanned in the preparation of this catalogue, both further overlooked errors in other sources as well as new errors incurred by our own human fallibility will inevitably, despite all precautions, have introduced many as yet undiscovered errors into this volume. We ask the users' indulgence and will welcome unambiguous corrections."

These remarks reflect the manual way in which this catalog was traditionally maintained.

Summary of Completeness of Data (4th edition, 1982):

Item	3rd ed.	4th ed.
Photoelectric magnitudes	50%	95%
B-V Colors	50%	94%
MK Spectral Classes	75%	93%
Parallaxes	30%	31%
Radial Velocities	75%	93%

Remarks from the Supplement to revision 4:

"REMARKS are given for 49% of the stars. For a high percentage of these, various data in the literature have been found to be discordant."

"In Figures 2 - 4 the color indices (B-V), (U-B), and (R-I) are compared with the major spectral classes. The considerable dispersions reflect possible errors in spectral classification, in the colors, and color excesses, as well as differences in luminosity classes..."

In summary, given the above issues, it seems desirable to generate a replacement for the Yale BSC, using more modern sources of data.

Hipparcos

The Hipparcos mission is unusual. It has two distinct detectors, and has correspondingly generated two distinct catalogs, called Hipparcos and Tycho. The Tycho portion was added late in the planning of the mission. The Tycho catalog is larger, but has less precision.

There are a number of catalogs related to the Hipparcos mission:

HIC - the Hipparcos Input Catalog, used to plan the mission. Note that HIC has data on radial velocity, which is not present in HIP/TYC (see below).
CCDM - Catalog of Components of Double & Multiple stars (2002), also used to plan the mission. The CCDM was also updated as a result of the mission itself.
HIP - the main Hipparcos catalog, version 1 (1997)
HIP-2 - the re-processing of the main Hipparcos catalog (2007), which updated the core astrometry
TYC - the first version of the Tycho catalog (1997)
TYC-2 - the second, more precise version of Tycho (2000)
Visual Double Stars in Hipparcos (2000)
Millenium Star Atlas, published by Sky and Telescope
Celestia 2000 atlas

Hipparcos doesn't include radial velocity data, but the Hipparcos Input Catalog does.

Main documentation:

ESA main page
quick summary
list of the 50 brightest stars in Hipparcos
VizieR tables
double and multiple systems (complicated)

Hipparcos-2 (2007)

The data reduction for Hipparcos is complex and subtle (outline paper). This is a second pass at the astrometric data reduction, which improves upon the past. It incorporates a better understanding of the rotation of the spacecraft. With this new reduction, the astrometry for bright stars Hp < 8.0 is increased "up to a factor of 4".

Hipparcos-2 has 117955 records (VizieR), 263 fewer than the original Hipparcos data.

"What took more than 6 months some 12 years ago, takes currently about a week on a single desktop computer..."

Radial Velocity Catalogs

Hipparcos lacks radial velocity (RV) data. It's useful to look for ways to fill that gap.

Pulkovo (2006)

radial velocities for 35,495 stars, all in Hipparcos, all sky
median accuracy of the radial velocities obtained is 0.7km/s (Gaia is ~0.5km/s)
paper and reference
inputs from 203 publications, weighted means
7691 stars V <= 6.5
147 stars V < 3.0
uses the WEB catalog and the Barbier-Brossat/Figon catalogs as inputs

This catalog has good precision, but is missing 14% of the bright stars.

Pulkovo is missing 1,183 of the bright stars found in Hipparcos (about 14%). If supplemented with Barbier-Brossat/Figon (2000) instead of BSC, the remainder is missing only 16 stars.

General Catalog of mean radial velocities Barbier-Brossat and Figon (2000)

This catalog is more complete for the bright stars, but is less modern than Pulkovo.

has 8682 stars for Vmag <=6.5, all with HIC (same as HIP)
whereas Pulkovo has a one entry per HIP, this catalog can have multiple entries per HIP identifier
has 175 stars V < 3.0
extends WEB with several more years of data; 36145 stars
weighted means of N measurements
the paper is in French
4 categories for quality of the measurement, with the smallest being <= 2.5km/s, and the largest <= 10km/s
the relative errors are often large

Hipparcos Input Catalog (HIC)

It takes most of its RV values from WEB (Wilson, Evans, Barbier-Brossat/Figon). It has 3559 stars for Vmag <= 6.5 (missing 60%).

WEB link (1995)

WEB uses three sources:

General Catalogue of Stellar Radial Velocities (Wilson 1953), biased towards the northern hemisphere, not very many records
Evans (1978)
General Catalog of mean radial velocities Barbier-Brossat/Figon (2000).

GAIA

The cross-matching with HIP is particularly mediocre. It finds only about 70% of the desired items. They are working on this problem, but I don't know if a solution has been found yet.

Radial velocity precision for brighter objects ~0.2-0.3 km/s.

Gaia has no Johnson Vmag field.

Its Early Data Release 3 (EDR3) has 150 bright stars with G < 3.00.

Bright Stars: How Many?

Using VizieR, here are the number of records with magnitude <= 6.5. Different catalogs use different bandpasses, so these are not all directly comparable.

BSC Supplement   267  V
BSC            8,404  V
HD             8,524  Photovisual 
TYC            8,851  Vmag
HIP            8,874  Vmag
HIC            3,559  Vmag
Gaia EDR3     12,119  Gmag

I don't know the reason why Gaia's value is so high.

For this project, the number of records is 5112, with magnitude <= 6.0.

As above, but this time the number of records with magnitude < 3.0 (again using VizieR):

BSC Supplement  0  V
BSC           170  V
HD            147  Photovisual 
TYC           166  Vmag
HIP           172  Vmag
HIC           169  Vmag
Gaia EDR3     150  Gmag

The docs for Gaia state that there's poor support for bright stars, but the above table contradicts that statement. It only seems to be missing only about 20 stars.

For this project, the number of records is 173 with magnitude < 3.0. (This is 1 greater than the HIP result stated above because of HIP's Vmag for Omicron Ceti.)

Bright Stars: Candidate Data

Different catalogs address different needs. Here are the core items that interest me:

right ascension
declination
trigonometric parallax
proper motion α * cosδ
proper motion δ
radial velocity
is the star multiple or is part of a multiple?
is the star's magnitude variable?
Johnson V
Johnson B-V

Having 3 pieces of data for position and 3 for velocity means that the full 3D motion is specified (full kinematics).

Other data of interest:

cross-match identifiers from other catalogs (Bayer, Flamsteed, HD, SAO)
spectral type; there can be considerable disagreement on this in published sources link
proper name
the constellation name