Science and Progress: Short Period Planets in Q1
Chris Lintott (Zookeeper Chris) and I wanted to give an update on what the team is working on and some of the changes made to the PH site to help us answer the question we are tackling right now. We used very simple cuts and visual inspection to come up with a preliminary list of planet candidates that John has discussed in an earlier post. We’ve been brainstorming on how to combine the results from all the multiple user classifications (about 10 users looking at each lightcurve) to tease out every transit in the database of over 2.0 million classifications. We are working hard on more sophisticated algorithms and techniques to take all your Q1 classifications and transit boxes and extract transits and planet candidates.
After starting to look at your classifications and results from the simulated transits, Chris and I think an interesting question to look at is what are the abundances of planets on short period orbits (less than 15 days ) in the Q1 data. The Kepler team is doing something similar and it will be very interesting to compare the two results. As an initial step we are only looking at planets bigger than 2 Earth radii so only gas and ice giants because the transits are more pronounced than the smaller rocky planets. Less than 2 Earth radii will be much harder to detect, so we first we want to develop the analysis tools and then we’ll come back to the less than 2 Earth radii planets later.
With just the transit discoveries alone we can’t answer this question. This is because we don’t know how complete the sample is. If we found 120 Neptune-sized planets for example, we can’t say anything about their abundance compared to Jupiter-sized planets, since we don’t know how many we might have missed in the data set. This is where the synthetic transits we insert into the interface play an important role. If users flag 100% of the Jupiter-sized simulations with orbital periods shorter than 15 days, but only 50% of the Neptune-sized synthetic transits, then we know that the number of transiting Neptunes in the real light curves is a factor of two larger than what we found. With this completeness estimate we can debias our sample and begin to understand the spectrum of solar systems providing crucial context for own solar system.
We find that we need higher numbers and finer resolution in period and radii for the synthetic lightcurves to do this analysis. Starting today, mixed in with the Q2 data, we will be showing newly generated synthetic Q1 lightcurves specifically made for this task. As always with the simulated transits ,we will identify the simulated transit points in red after you’ve classified the star and will mark the lightcurve as simulated data in Talk . With the results from these synthetics we can better tweak our analysis tools for extracting transits from your classifications as well as get sufficient numbers to calculate the short period planet detection efficiency for Planet Hunters. The new synthetics won’t be the only non-Q2 lightcurves you see. We also have about 5800 additional lightcurves from Q1 that were released by the Kepler team on Feb 1st,. Now that the Q2 data upload is complete, these have now been introduced into the database and we’ll be showing these mixed in the classify interface as well as a small subset of the Q1 data previously looked at to examine how classifications have changed over time since December.
Chris and I have are aiming to have the bulk of the analysis complete before October, so we can present the results at the joint meeting of the European Planetary Science Congress (EPSC) and the American Astronomical Society Division for Planetary Sciences (DPS) meeting being held in Nantes, France, in October. We will keep you posted on our progress and results as time goes on. Abstracts are due in May, and so we need to start work now to be able to have results for the Nantes meeting. With your help, we think this will lead to a very interesting paper.
11 responses to “Science and Progress: Short Period Planets in Q1”
Trackbacks / Pingbacks
- May 30, 2011 -
- May 31, 2012 -
Do these synthetic light curves have gaps? If not, it’d pretty easy to know if a lc contains a simulated transit. I guess you had it in mind, but you never know 🙂
The new synthetic lcs I am talking about are ones from Q1 so no gaps ( we use the real lightcurves to inject transit signals -best way we have of modeling the noise and error bars properly) – but we are showing a subsection of the old Q1 data (that’s been classified previously) to see how classifications have evolved over time since launch, the new Q1 data lcs released on Feb 1st, and the new Q1 synthetics. So just because a lc you’re viewing doesn’t have a gap doesn’t mean it’s not real data and is a synthetic – so a good fraction of the time you see a lightcurve without a gap you’ll actually be seeing untouched Q1 lightcurves (ie real data) –
I might have asked the wrong question. The point is – are you going to use Q2 lcs to create simulated transits? Because right now if I see a lc with gaps, but without spikes (like a safemode event), I can be almost sure the data comes from Q2 and I know there are no simulated transits in this lc, since there are no Q2 simulations. So firstly, does that affect your estimations, and secondly, perhaps I’m missing something here? 🙂
Ah – I’m not talking about these synthetics being used for analysis for Q2 – for each quarter we’ll have to use the lightcurves from it to generate the synthetic transit lightcurves to properly reproduce the noise and processing properties
I’m really focusing on Q1 right now -so I’m reintroducing new synthetics for Q1 since that’s the dataset we have complete -we’re re-introducing some of the lcs to see how classifications have changed and adding in the new Q1 data no one’s classified plus the new Q1 synthetics -I’m not talking about Q2 right now (Yes for Q2, Q2 synthetics are made with Q2 lcs and they have gaps 🙂 ) – so yes if there’s no gap you will know that’s a Q1 lightcurve but you won’t know if it’s a simulation or the real deal
On Q2 – I haven’t said anything about the Q2 synthetics- we do make synthetics for each quarter of data we have. So just because there’s a gap doesn’t mean it’s not a synthetic lc. No gap just means it is a Q1 lightcurve of some kind. The way we have the priorities set, seeing a simulated lc is dialed down pretty low – we’ll be pushing at a higher priority the Q1 lcs coming into the site (new, old, simulated) so we can get them finished and analyzed
Hope that clarifies things a bit,
More than a bit, thanks for your patience 🙂
The one thing about synthetic data is that I’ve often wondered about is that I can usually tell which light curves are synthetic because the density of vertical white bars lining up the points is greater than the average light curve. A case in point: http://www.planethunters.org/sources/SPH10265929 In that example I found the transit, but I had been expecting there to be one somewhere in the light curve, and don’t know if I would have marked it had I not recognized it as simulated data, presumably containing a transit. I’m more prone to mark even marginal (or imaginary) transits in simulated data simply because I’m expecting one or more to be there. I’m sure I’m not the only one who can recognize the simulated data. Could this not skew the accuracy of simulated data as a means of determining the smallest transit events likely to be detected, or does it help validate the threshold limits? It also raises the question, should I be marking potential transits in real data as liberally as I would in simulated data? (If I truly can’t see any viable, or even speculative, potential transits in the simulated data, I don’t mark anything.)
yeah – that shouldn’t be the case – thanks for pointing that out – we need to have the error bars as close to the kepler data – the design team’s tracking down what’s the issue – much appreciated
It seems this was something that effected the old synthetics but was fixed for the new sythetics I’ve uploaded, so it doens’t effect this analysis, and we’ve fixed the old synthetics.
The synthetics are as intended now–I can’t tell the difference between them and real data.