• vegetaaaaaaa@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Very sorry for the missing attribution. I added it in the header, should be online now.

    No problem, thanks for fixing it so quickly!

    at least it provides the ability to find projects that likely support docker for example

    This information is already present but we need to make it easier to browse projects by platform -> https://github.com/awesome-selfhosted/awesome-selfhosted-data/issues/71

    There should be a label with the max number of commits to provide some scale,

    ^now I see it but it’s a bit small :)^

    I looked a bit into how best to determine the activity level of a project and for me commit history is one of the strongest signs. Also Github uses this kind if graph in a users repo overview so its already familiar to GitHub users.

    I see, I agree it’s a good metric. I am wary of adding too much information that relies on Github APIs because 1. it encourages centralization on a proprietary platform 2. we are already running into API rate limits with just the info we currently gather (stars count + last commit date). And we would need a way to store this graph information in the raw YAML data somehow. If you want to create an issue for this, to gather more feedback and so it can be discussed further, please do!

    • m4z@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I am wary of adding too much information that relies on Github APIs because 1. it encourages centralization on a proprietary platform

      Couldnt agree more! This is why I decided not to use discussions or issue count since those are GitHub specific. Commit count could be gathered from any platform that uses git.

      1. we are already running into API rate limits with just the info we currently gather (stars count + last commit date).

      Im using the GitHub GraphQL API to query all necessary information for 100 repos at a time, including commit count per month. The rate limit for GraphQL is allowing up to 5000 points per hour and on of these queries only costs 1 point. Herre is my code to generate the query: https://github.com/mkitzmann/awwesome/blob/main/src/lib/query.ts

      And we would need a way to store this graph information in the raw YAML data somehow. If you want to create an issue for this, to gather more feedback and so it can be discussed further, please do!

      The Informationen is currently just „{month: count,…}“, so should be fairly simple to store in yml. I will create an issue for it.

      • vegetaaaaaaa@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        The rate limit for GraphQL is allowing up to 5000 points per hour

        When using the automatic GITHUB_TOKEN created for each GH actions workflow, the limit is only 1000 requests per hour [1]. The list has 1080 projects whose source is hosted on github, and we must do 2 API calls for each one because for some reason the date of last commit to the default branch is not available directly from the repos API endpoint [2] (do not trust updated_at/pushed_at, it’s a lie). So we currently have to add a sleep of ~7 seconds between each API call to not hit the rate limit for the metadata update job that runs daily.

        The limit of 5000 requests/hour only applies to personal access tokens which have far too many permissions on my personal account to be used in a shared/community project.

        Someone in https://old.reddit.com/r/selfhosted/comments/15y7y36 mentioned https://analyzemyrepo.com/analyze/inventree/InvenTree which may be interesting to integrate somehow.