During the last months, there has been a lot of discussion around ethics and open source software development. Since open source software is becoming more and more a key component of any industry and the society digital transformation, I think it is worthy to have the discussion. This post is my first set of humble ideas about the topic.
Continue reading “Ethics and open source software development”My experience with a Chromebook
This post tries to brief my short experience with a Chromebook device, a Samsung Chromebook Pro.
During last year I’ve been looking for a tablet-form-factor device with Linux and OSS inside, light, cheap and that allow me to work remotely. Sadly, it seems such thing doesn’t exist, but many people have recommended me to try Google Chromebooks.
Happy birthday Liferay Spain!
It seems that Liferay Spain is celebrating its 10 years birthday today! Happy Birthday!!
Let me tell you an story .. Yes, this is one of those moments I feel a little bit old.
How I met Liferay
Analyzing Open Source development (part 3)
In last post about analyzing open source development I mentioned that this one would be about massaging people information to have unique identities for all the project contributors.
But before that, I would like to explore something different. How to get data from multiple repositories? What happens when I want data from a whole GitHub organization’s or user’s repositories?
The obvious answer would be:
1. Let’s get the list of repositories:
import requests
def github_git_repositories(orgName):
query = "org:{}".format(orgName)
page = 1
repos = []
r = requests.get('https://api.github.com/search/repositories?q={}&page={}'.format(query, page))
items = r.json()['items']
while len(items) > 0:
for item in items:
repos.append(item['clone_url'])
page += 1
r = requests.get('https://api.github.com/search/repositories?q={}&page={}'.format(query, page))
items = r.json()['items']
return repos
2. And now, for each repository, run the code seen in previous post to get a dataframe for each one in list and concat them with:
df = pd.concat(dataframes)
For organizations or users with a few repositories, it would work. But for those with hundreds of repositories, how long would it take to go one by one fetching and extracting info?
Would there be a fastest approach? Let’s play with threads and queues…
Continue reading “Analyzing Open Source development (part 3)”
Analyzing Open Source development (part 2)
Following up previous post introducing how to get Open Source development data, it’s time to play with them. I’ll use Pandas, the Python Data Analysis Library as main tool in this post.
Playing with data
Let’s create a simple data frame with information about the files touched in each commit. The script could look like this:
Continue reading “Analyzing Open Source development (part 2)”