This article provides an overview to get started with the Github connector in Peliqan. Please contact support if you have any additional questions or remarks.
In peliqan, go to Connections, click on Add new. Find Github in the list and select it. Click on the Connect button. This will open Github and allow you to authorize access for Peliqan. Once that is done, you will return to Peliqan.
Explore & Combine
Explore your Github tables in the Peliqan “Explore” section, including files, commits, PRs and issues.
Sync file contents to the datawarehouse
Use below custom pipeline script to sync the file contents of all files in your Github repo to the Peliqan data warehouse.
‣
Click to expand script
Writeback (Python scripts)
Below are examples of how to read and write from and to Github from Peliqan Python scripts (data apps).
Write file to Github (commit)
Here’s an example to commit files to Github from a script in Peliqan:
☝
Note that in order to update an existing file, you need the SHA. Use file_get() to retrieve the SHA of the file first, and include it when updating the file.
Using the Github connector, you can build CI/CD pipelines that e.g. push updates of data models (queries) and Python scripts from Peliqan to Github, and that pull in updates from Github into Peliqan.
Example: pull Python script update from Github into Peliqan
Example: pull data model (query) update from Github into Peliqan
Example: commit Python script from Peliqan to Github
Example: commit data model (query) from Peliqan to Github
import base64
github_schema_name = "github"
github_owner_name = "my_github_username"
github_repo_name = "my_repo"
BATCH_SIZE = 1000
MAX_FILE_SIZE = 1000000
dbconn = pq.dbconnect(pq.DW_NAME)
github_api = pq.connect("Github")
bookmark = pq.get_state()
if not bookmark:
bookmark = "1970-01-01 00:00:00"
bookmark = "1970-01-01 00:00:00"
count_query = f"""
SELECT COUNT(*) as count
FROM "{github_schema_name}".files
WHERE type='blob' AND size<{MAX_FILE_SIZE} AND _sdc_batched_at > '{bookmark}'
"""
total_count = dbconn.fetch(pq.DW_NAME, query=count_query)[0]["count"]
processed = 0
latest_batched_at = bookmark
for offset in range(0, total_count, BATCH_SIZE):
query = f"""
SELECT path, sha, url, size, type, _sdc_batched_at
FROM "{github_schema_name}".files
WHERE type='blob' AND size<{MAX_FILE_SIZE} AND _sdc_batched_at > '{bookmark}'
ORDER BY _sdc_batched_at
LIMIT {BATCH_SIZE} OFFSET {offset}
"""
files = dbconn.fetch(pq.DW_NAME, query=query)
new_records = []
for file in files:
st.write("Downloading file: " + file["path"])
file_content_resp = github_api.get("file", path = file["path"], repo = github_repo_name, owner = github_owner_name)
content = file_content_resp.get("content")
if file_content_resp.get("encoding") == "base64":
content = base64.b64decode(content).decode("utf-8")
record = {
"path": file["path"],
"sha": file["sha"],
"content": content,
"size": file["size"],
"url": file["url"]
}
new_records.append(record)
# Update batch bookmark
if file["_sdc_batched_at"] > latest_batched_at:
latest_batched_at = file["_sdc_batched_at"]
if new_records:
result = dbconn.write(github_schema_name, "files_content", records=new_records, pk="path")
if result["status"] != "success":
st.warning("Could not write to DWH")
st.write(result)
exit()
processed += len(new_records)
st.info(f"Wrote batch of {len(new_records)} files (total processed: {processed})")
pq.set_state(latest_batched_at)
st.success(f"Processed {processed} files.")
import base64, json
github_api = pq.connect('Github')
content = { "test_key": "test content" }
content_string = json.dumps(content)
base64_content = base64.b64encode(bytes(content_string, 'utf-8')).decode('utf-8')
file = {
"path": "test_folder/test_file.json",
"repo": "my_repo",
"owner": "my_github_username",
"message": "This is a commit from Peliqan script",
"base64_content": base64_content,
"committer_name": "John",
"committer_email": "john@peliqan.io"
}
get_file = github_api.get('file', file)
if 'sha' in get_file:
st.text("Updating file")
file['sha'] = get_file['sha']
result = github_api.update('file', file)
else:
st.text("Adding new file")
result = github_api.add('file', file)
st.text(result["status"])