At .astronomy this past week, the keynote
speaker was Alberto
Accomazzi, introducing ADS
Labs and ADS 2.0, which are
really quite impressive.
I was inspired to use ADS Labs to help me auto-generate a nicely
formatted CV for myself. I used Andy
Casey's
ADS-python as a starting
point: he introduces a useful convention of storing your ADS API
Key:
import os
def get_dev_key():
""" A convenience function for accessing a system-wide ADS Developer's Key """
ads_dev_key_filename = os.path.abspath(os.path.expanduser('~/.ads/dev_key'))
if os.path.exists(ads_dev_key_filename):
with open(ads_dev_key_filename, 'r') as fp:
dev_key = fp.readline().rstrip()
return dev_key
if 'ADS_DEV_KEY' in os.environ:
return os.environ['ADS_DEV_KEY']
raise IOError("no ADS API key found in ~/.ads/dev_key")
We'll use requests to send the request to ADS, then json to
parse the data.
import requests
import json
We then query the database using a keyword query (parameter q)
specifying the author. Other required parameters are the API key
(dev_key) and a filter to select only astronomy articles. The
maximum number of rows returned in the API is 200 right now, which I
have set (the default is 10 or 20).
response = requests.post('http://adslabs.org/adsabs/api/search/',
params={'q':'author:ginsburg, a',
'dev_key':get_dev_key(),
'rows':200,
'filter':'database:astronomy'})
J = response.json()
J.keys()
[u'meta', u'results']
The JSON 'meta' key is just metadata about the query, include the number
of matches and execution time.
{u'api-version': u'0.1',
u'count': 54,
u'hits': 54,
u'qtime': 7,
u'query': u'author:ginsburg, a'}
The 'results' key includes what we're actually interested in, under
another key 'docs'.
[u'docs']
datalist = J['results']['docs']
type(datalist), len(datalist)
(list, 54)
datalist is a list of the retrieved bibliographic entries.
[u'bibcode',
u'keyword',
u'pubdate',
u'bibstem',
u'property',
u'aff',
u'author',
u'citation_count',
u'pub',
u'page',
u'volume',
u'database',
u'doi',
u'year',
u'abstract',
u'title',
u'identifier',
u'issue',
u'id']
At this point, most of the remaining work is building up a nicely
formatted output. We'll start with a web-specific example, using HTML
unordered lists.
In this example, we'll make a list item that creates a hyperlink for the
author names and uses a reasonably standard bibliographic format:
Authors Month, Year, Journal
Title
fmt = (u' '
u'<li><a class="norm" href="http://adsabs.harvard.edu/abs/{adsbibid}">{creator}</a>'
u' {month}, <b>{year}</b> {journal}\n'
u' <br> {titlestring}')
We need to do a little data wrangling to get the individual JSON entries
into the appropriate format:
def wrangle(data, authorname='Ginsburg'):
""" Create new fields from the input data to insert into the format string """
data['month'] = data['pubdate'][5:7]
# Generally, the last identifier is the published version,
# while the first is an arXiv identifier
# (data['identifier'] is a list)
data['adsbibid'] = data['identifier'][-1]
# data['title'] & ['pub'] are also lists
data['titlestring'] = data['title'][0]
data['journal'] = data['bibstem'][0]
# This trick bolds my name in the list of authors
data['authors'] = ['<b>{}</b>'.format(x) if authorname in x else x for x in data['author']]
# Separate names by semicolons
data['creator'] = u"; ".join(data['authors'])
return data
The return from wrangle is a dict with new keys that match the
keywords in the format string. The python string.format method will
nicely ignore any extra keywords that we're uninterested in.
fmt.format(**wrangle(datalist[0]))
u' <li><a class="norm" href="http://adsabs.harvard.edu/abs/2013ApJ...773..102F">Fallscheer, C.; Reid, M. A.; Di Francesco, J.; Martin, P. G.; Hill, T.; Hennemann, M.; Nguyen-Luong, Q.; Motte, F.; Men'shchikov, A.; Andrxe9, Ph.; Ward-Thompson, D.; Griffin, M.; Kirk, J.; Konyves, V.; Rygl, K. L. J.; Sadavoy, S.; Sauvage, M.; Schneider, N.; Anderson, L. D.; Benedettini, M.; Bernard, J. -P.; Bontemps, S.; <b>Ginsburg, A.</b>; Molinari, S.; Polychroni, D.; Rivera-Ingraham, A.; Roussel, H.; Testi, L.; White, G.; Williams, J. P.; Wilson, C. D.; Wong, M.; Zavagno, A.</a> 08, <b>2013</b> ApJn <br> Herschel Reveals Massive Cold Clumps in NGCxa07538'
Now to show it in the notebook...
import IPython.display
IPython.display.HTML(fmt.format(**wrangle(datalist[0])))
You can make a complete bibliography by looping over a few entries. The
ordered list (<ol>) tag makes a numbered list.
html = "<ol>" + "\n".join(fmt.format(**wrangle(datalist[ii])) for ii in xrange(3)) + "</ol>"
IPython.display.HTML(html)
If you want to make sure you only include refereed articles, use the
'property' tag.
print ['REFEREED' in d['property'] for d in datalist]
[True, False, True, False, False, False, True, True, False, False, False, False, True, True, True, False, False, True, True, False, False, True, True, True, False, False, False, False, True, False, False, True, True, True, True, True, True, False, False, False, False, False, False, True, True, False, False, False, False, False, True, False, False, True]
And don't forget that you can also include the citation count:
print "\n".join(["{} {}: {}".format(d['author'][0],d['year'],d['citation_count'])
for d in datalist
if 'citation_count' in d and 'REFEREED' in d['property']])
Fallscheer, C. 2013: 0
Ellsworth-Bowers, Timothy P. 2013: 2
Smith, Nathan 2013: 0
Harvey, Paul M. 2013: 2
Bressert, E. 2012: 10
Ginsburg, A. 2012: 11
Bally, John 2012: 0
Ginsburg, Adam 2011: 5
Battersby, C. 2011: 22
Schlingman, Wayne M. 2011: 17
Ginsburg, Adam 2011: 6
van Aarle, E. 2011: 12
Aguirre, James E. 2011: 72
Bally, John 2010: 36
Battersby, Cara 2010: 30
Yan, Chi-Hung 2010: 5
Bally, J. 2010: 20
Dunham, Miranda K. 2010: 27
Rosolowsky, Erik 2010: 80
Ginsburg, Adam G. 2009: 10
Rubin, D. 2009: 23
van de Steene, G. C. 2008: 4
Golitsyn, G. S. 1985: 4
Wishlist
There are a few other features that would be nice to add to the CV, but
some are not yet well-supported.
- You can get the full name, but right now not the short name
('bibstem'), of the journal
- The bibtex entry is important for generating tex versions of CVs.
Currently, it is not possible to completely reproduce one, largely
because of point #1.
However, the ADS folks will certainly change this soon. You can find out
if they have by querying their API settings. If the query below returns
"True", then you can access the bibstem.
UPDATE 9/23/2013: Jay Luker @ADS added 'bibstem' to the allowed return
entries
permissions_response = requests.post('http://adslabs.org/adsabs/api/settings/',params={'dev_key':get_dev_key()})
permissions = permissions_response.json()
'bibstem' in permissions['allowed_fields']
True
In the meantime, you can get most of the way there. We'll create
"Article" entries for any articles or eprints and ignore abstracts
(e.g., conference abstracts). I don't have any books, but for others
that might be useful.
The approach we'll use is also a good way to reject unwanted articles in
the HTML bibliography above.
bibfmt = u"""@article{{{tagname},
abstract={{{abstract}}},
author={{{bibtexauthors}}},
month={{{month}}},
pages={{{page}}},
title={{{titlestring}}},
year={{{year}}},
volume={{{volume}}},
journal={{\\{lowercasejournal}}}
}}"""
Of course, it's necessary to wrangle the data again for the appropriate
author list formatting for bibtex:
def wrangleauthors(authorlist):
""" Fit the author list into a bibtex-friendly format.
Not the cleanest hack, since we need to allow for single-name
authors (e.g., astropy collaboration, Planck collaboration, etc.)
The triple braces are needed because TeX uses them"""
splita = [[b.strip() for b in a.split(",")] for a in authorlist]
bracketed = [u'{{{}}}, {}'.format(a[0], a[1].replace(" ","~"))
if len(a) > 1
else u'{{{}}}'.format(a[0])
for a in splita]
return u" and ".join(bracketed)
wrangleauthors(datalist[0]['author'])
u"{Fallscheer}, C. and {Reid}, M.~A. and {Di Francesco}, J. and {Martin}, P.~G. and {Hill}, T. and {Hennemann}, M. and {Nguyen-Luong}, Q. and {Motte}, F. and {Men'shchikov}, A. and {Andrxe9}, Ph. and {Ward-Thompson}, D. and {Griffin}, M. and {Kirk}, J. and {Konyves}, V. and {Rygl}, K.~L.~J. and {Sadavoy}, S. and {Sauvage}, M. and {Schneider}, N. and {Anderson}, L.~D. and {Benedettini}, M. and {Bernard}, J.~-P. and {Bontemps}, S. and {Ginsburg}, A. and {Molinari}, S. and {Polychroni}, D. and {Rivera-Ingraham}, A. and {Roussel}, H. and {Testi}, L. and {White}, G. and {Williams}, J.~P. and {Wilson}, C.~D. and {Wong}, M. and {Zavagno}, A."
Now we can start looping through, performing checks for article status,
and making bibentries. We'll use python's dateutils.parse to turn
month numbers into names
for d in datalist:
d['bibtexauthors'] = wrangleauthors(d['author'])
# pubdates don't include days, and sometimes don't include months,
# so we have to be careful
d['month'] = (dateutil.parser.parse(d['pubdate'][:-3]).strftime("%B")
if d['pubdate'][5:7] != '00'
else "")
# To make the standard macros, e.g. \apj, \aa
d['lowercasejournal'] = d['bibstem'][0].lower()
# list -> string
d['titlestring'] = d['title'][0]
# need to make sure there is something to put in the volume field
d['volume'] = d['volume'] if 'volume' in d else ''
# tagname is [First author's last name][year]
d['tagname'] = d['author'][0].split()[0].strip(",") + d['year']
bibdata = ""
for d in datalist:
if 'ARTICLE' in d['property'] or 'EPRINT' in d['property']:
bibdata += bibfmt.format(**d) + "\n\n"
Now this data can be saved to a bibliography file and parsed by LaTeX.
import codecs # for writing unicode
with codecs.open('mypapers.bib','w',encoding='utf8') as f:
f.write(bibdata)
Future work:
- Verify the bibtex
- Generate the CV LaTeX and compile it