Summary of SETA rewrite

capture20decc81cran202016-03-012009-48-111

It’s been three months since the GSoC started and it’s come to the end. First of all, I greatly appreciate my mentors (jmaher and armenzg) .  I have learned tons of things from them and it would have been impossible to get here without their help.  I also must say thank you to dustin, martiansideofthemoon and anyone who has helped me during these past three months. I really enjoy working with the people in the A-team and the Mozilla community. I believe I will keep being involved as long as possible!

This GSoC project involved several parts- (1) refactor SETA’s code and make it robust (2) make SETA work on Heroku (3) Reduce database size and use data from other systems (4) Integrate with the Gecko decision task

For code refactoring, we’ve fixed many existing pep8 errors and added flask8 support in #PR87 . We’ve also verified and removed some redundant packages from requirement.txt because we no longer needed them. Furthermore, in order to make the  codebase more readable and easy to test, we started using sqlalchemy to do the database operations instead of embedded SQL statements and added some tests for it in #PR109. Sqlalchemy turned out to be a fantastic choice  for us because it does not only makes tests easier to write, but it also give us faster database querying and storing. In #PR84, we use fetch_json instead of the pushlog endpoint which make things cleaner. As a bonus point, we fixed insert key errors in #PR92 and added a failure column in #PR82. We’ve also made SETA display job results appropriately in #PR90. In #PR91, we made linux64 debug jobs be visible on SETA. At the moment it’s not only useful on linux64 debug jobs but it also works for other job data that comes from Taskcluster.

Screen Shot 2016-05-15 at 8.39.18 PM

The second part of this project is make SETA works on heroku, and all the related PRs are included in MikeLing/ouija-rewrite branch. First of all, we need migrate our database from mysql to postgresql(it’s default database on heroku) and things become much more easier  after we switch to the sqlalchemy [PR]. Secondly, we need to make updatedb.py and failures.py(we use these two scripts to update our database and store our analysis results) running automatically[PR]. Then, we add a stage server for SETA, it could  do pre-deployment validation as what has been done in treeherder and avoid breaking something accidently in the target server. I must say thank you to armenzg again because he gives me a lot of  helps on this and helps me fork repo to the stage server. Anyway, we couldn’t works well on the heroku without armen’s help 🙂

The next step was about reducing database size and to use the data from another system. In #PR88, we made SETA only store high value jobs instead of low value job (because we only require around 165 high value jobs while there’s about 2000 low value jobs) and store 90 days of data instead of 180 days in our database. As Joel said, it’s a big win for reducing our database size:). In #PR89 we got rid of ‘logfile’ in the testjobs table because it wasn’t being used in the analysis of failures. Then, in #PR93 and #PR100 we started using the runnable API instead of the uniquejobs table and cached it as runnablejob.json locally. It allow us to query all job types and related information with more accuracy and on real time. As a bonus, we use underscore.js to simplify our JavaScript and make our js code more readable. Other related PRs are #PR112, #PR99, #PR105, and #PR106.

The final piece of this project is to integrate with the Gecko decision task. On the server side, we separated Taskcluster jobs from Buildbot jobs and started listing all low value jobs to ensure that we run brand new jobs by default in #PR101 and #PR113. TaskCluster can query the low value job list from the server side and can create decision task based on it (You can check it out on http://seta-dev.herokuapp.com/data/setadetails/?taskcluster=1)

Screen Shot 2016-08-11 at 11.45.18 PM

We also found a way to identify new jobs from runnablejobs.json and remove expired jobs from preseed.json in #RP112. In bug 1287018, we try to figure out how to make TaskCluster use data from SETA to schedule tasks and I committed several patches about it. The Gecko decision task is the vital part for our task scheduling and a lot of things need to be discussed in there. This is now a stretch goal for this project and I will keep working on it after GSoC work period :).

Advertisements

SETA rewrite-Database Migration

Before I write note about GSoC, I really want to say this week is really messed up. My healthy status just like a roller coaster, my head is heavily sometimes and I got allergy(maybe). I can bare remember when I got skin allergy last time(maybe when I was 6 or 7 years old), and I don’t think see a doctor is a good choice because only thing they could do is give you some anti allergy, which can dizzy me whole day 😛 Whatever, let use flip to the next page.

Warning: Due to my poor sql knowledge, I actually have no ideal about how to describe my work of this weekend. I keep making mistake on very basic and simple thing and took me a lot of  time to figure it out. So, this blog may looks like a patchwork. I’m sorry about that. 😦

“If Time Can Roll Back, ……”

Alright, this title is all I think about during the migration work. “If time can roll back, I really should focus on my database class.”,”If time can roll back, I really should look more into it before I ask that dumb question.”, etc.

Anyway, first of all, after I can check the heroku log about SETA, I found the we have a error message like: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch And after I google it, I found this error is because I use ‘web’ dyno to boots my application and “web” type of application means that your app MUST listen some port. So, we capture $PORT from the environment variable  when boot the server. Now, ouija can ‘run’ on the heroku. But it’s far more than enough.

Problems keep showing up when I head to the database part. The first one is, I don’t know how to use it on heroku 😛 Everything look good to me except I found no data been actually wrote into the database! I spend a long time to find out why I can connect, and run sql command on database but no data will been wrote into it after that. Finally, I found I forget to ‘commit’ after I  execute sql command(Boom!)

Ok, we can import data into database now….oh wait, not yet. Error keeps showing up when I use failures.py to update jobtype. The reason is, in postgres, we can’t store a string array as varchar(which is what we do in mysql). The postgreSql as an Object-Relational database, has many data types which mysql hasn’t. For example, we store ‘jobtype'(e.g [‘android-4-3-armv7-api15’, ‘debug’, ‘crashtest-10’]) as varchar in mysql. But postgres has ‘array’ type and we could just define it as text array.  Furthermore, it tells me psycopg2.ProgrammingError: no results to fetch in this line. We can find the document said that A ProgrammingError is raised if the previous call to execute*() did not produce any result set or no call was issued yet, but it’s fine when we use it in Mysqldb 😛

After all, after resolve those problems mentioned above, we can visit ouija on heroku now (\o/)

Screen Shot 2016-05-15 at 8.39.18 PM

SETA rewrite-heroku deploy

The first step for SETA rewrite is about to deploy it on heroku platform. A little different place is we need to switch our database to Postgres-a heroku native database.

Why heroku

Heroku is a platform as a service (PaaS) that enables developers to build and run applications entirely in the cloud. When you need to deploy you code, the only thing need to do is link your github(let’s assume your code been stored in there) and push it to heroku. Done! Everything are set up then. The app deployment and code management are become much more easier and faster. And you can stretch the hardware configuration as you want. More information could been found in its home page.

I won’t spend too much time in this session because the only thing I did(as I said) is set up my local environment and **push**. The most sticky thing I meet in this step is I can’t use any add-one until valid my heroku account by credit card, but I don’t have it. We solve it at last by move its ownership  to the Mozilla crop. Thank you Armen 🙂

Database migration

PostgreSql, as I quote, is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards-compliance. As a database server, its primary function is to store data securely, supporting best practices, and to allow for retrieval at the request of other software applications. It can handle workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.(I just copy from wikipedia, nevermind :P)

First thing we need to do is update our sql file so it can work on postgreSql. The postgres doesn’t use single quote for sql command or variable(column) name. For example, in MySql, we could create table like: ‘CREATE TABLE IF NOT EXISTS `tablename`’. But it must been updated as ‘CREATE TABLE IF NOT EXISTS tablename’. And postgres don’t have ‘datetime’ type for column, but we could use ‘timestamp‘(without time zone) instead. Ok, let’s see what we got now 😉

List of relations
Schema | Name | Type | Owner
——–+————+——-+———-
public | dailyjobs | table | mikeling
public | seta | table | mikeling
public | testjobs | table | mikeling
public | uniquejobs | table | mikeling
(4 rows)

All right, we should make ouija could connect to database and query from it now. This part of code is in updatedb.py. We should no longer use Mysqldb-python in there now because postgres is our main dish here, so we import psycopg2 instead. But it’s not enough for make it happened on postgres. The psycopg2 doesn’t support double quote in its .execute(). All right, let’s switch to ‘single quote’ now 😛

(SETA)MikeLings-MacBook-Pro-2:ouija mikeling$ python src/updatedb.py –delta 2
INFO:root:Downloader 1: a7783f2b4548 – 2016-05-07 02:11:30
INFO:root:uploaded 1/(13) results for rev: 27cfedea2aa7, branch: try, date: 2016-05-07 03:40:53
INFO:root:Downloader 1: 3966941f9dc3 – 2016-05-07 03:40:53

OK, we could deploy it on heroku and see if it can works on remote right now. But actually it’s not and I don’t know how to adjust it. Because I can’t access the log information(one disadvantage for lose your ownership). So, Some more works need to be done in next week after I can read log and configure app on heroku. BTW, I also fix one small issue for ouija.

Featured

Bravo! Been accepted by GSoC

As a postgraduate student reading in Nanchang Hangkong University, I have been accepted by GSoC and work for Mozilla. It’s a superb honor for me and so I decide to mark down things I do for GSoC and all my open source active in this blog.

What’s the GSoC

capture20decc81cran202016-03-012009-48-11

The Google Summer of Code, often abbreviated to GSoC, is an international annual program, first held from May to August 2005, in which Google awards stipends (of US$5,500, as of 2016 ) to all students who successfully complete a requested free and open-source software coding project during the summer.

Mozilla And A-team

mozilla_foundation_logo
I believe every computer fans have heard about Mozilla. And I believe you had known about Firefox, which is a independent, people-first browser made by Mozilla. Due to its famous name has long been known to people, I will only talk about A-team and the project I get involved into.

The A-team is the nickname of ” Automation and Tools team”, which is a team of engineers focus on improving the quality and productivity of engineering at Mozilla. For me, it’s amazing experience to work with the people in there and learn from them. There are a lot of awesome stuff in there for auto test and performance display and my GSoC peoject is about rewrite SETA, a tool for extraneous tests.Some more detail in here  My mentor is Jmaher and Armenzg .