Query & integrate data¶

import lamindb as ln
import bionty as bt

→ connected lamindb: testuser1/test-facs

ln.context.uid = "wukchS8V976U0000"
ln.context.track()

→ notebook imports: bionty==0.49.1 lamindb==0.76.4

→ created Transform('wukchS8V976U0000') & created Run('2024-09-06 08:30:51.815520+00:00')

Inspect the CellMarker registry ¶

Inspect your aggregated cell marker registry as a DataFrame:

bt.CellMarker.df().head()

	uid	name	synonyms	description	gene_symbol	ncbi_gene_id	uniprotkb_id	source_id	organism_id	run_id	created_by_id	updated_at
id
41	3ZFziy5ims8J	CD14/19	None	None	None	None	None	NaN	1	2	1	2024-09-06 08:30:45.153391+00:00
40	31nZfqQo8yZg	CD103		None	ITGAE	3682	P38570	28.0	1	2	1	2024-09-06 08:30:45.143810+00:00
39	1iLDs6cZIpxj	CD69		None	CD69	969	Q07108	28.0	1	2	1	2024-09-06 08:30:45.143767+00:00
38	525YfNUB967z	CD49B		None	ITGA2	3673	P17301	28.0	1	2	1	2024-09-06 08:30:45.143730+00:00
37	3IPMBjs68Vy1	CXCR4		None	CXCR4	7852	P61073	28.0	1	2	1	2024-09-06 08:30:45.143692+00:00

Search for a marker (synonyms aware):

bt.CellMarker.search("PD-1").df().head(2)

	uid	name	synonyms	description	gene_symbol	ncbi_gene_id	uniprotkb_id	source_id	organism_id	run_id	created_by_id	updated_at
id
29	33vFR1q26vnM	PD1	PID1\|PD-1\|PD 1	None	PDCD1	5133	A0A0M3M0G7	28	1	1	1	2024-09-06 08:30:24.538763+00:00

Look up markers with auto-complete:

markers = bt.CellMarker.lookup()
markers.cd8

CellMarker(uid='1xRpnOHIkdyE', name='CD8', synonyms='', gene_symbol='CD8A', ncbi_gene_id='925', uniprotkb_id='P01732', created_by_id=1, run_id=1, source_id=28, organism_id=1, updated_at='2024-09-06 08:30:24 UTC')

Query artifacts by markers ¶

Query panels and collections based on markers, e.g., which collections have 'CD8' in the flow panel:

panels_with_cd8 = ln.FeatureSet.filter(cell_markers=markers.cd8).all()

ln.Artifact.filter(feature_sets__in=panels_with_cd8).df()

	uid	version	is_latest	description	key	suffix	type	size	hash	n_objects	n_observations	_hash_type	_accessor	visibility	_key_is_virtual	storage_id	transform_id	run_id	created_by_id	updated_at
id
1	qGXQ3oXRQrrfYSAf0000	None	True	Alpert19	None	.h5ad	dataset	33374864	QNP1c3p6scaAwPo9AW8fLw	None	166537	md5	AnnData	1	True	1	1	1	1	2024-09-06 08:30:34.881573+00:00
2	cWmcWgk8zKjnt0Sj0000	None	True	Oetjen18_t1	None	.h5ad	dataset	46506448	WbPHGIMM_5GT68rC8ZydHA	None	241552	md5	AnnData	1	True	1	2	2	1	2024-09-06 08:30:45.686627+00:00

Access registries:

features = ln.Feature.lookup()

Find shared cell markers between two files:

artifacts = ln.Artifact.filter(feature_sets__in=panels_with_cd8).list()

shared_markers = artifacts[0].features["var"] & artifacts[1].features["var"]
shared_markers.list("name")

['Cd4', 'CD8', 'CD3', 'CD27', 'Ccr7', 'CD45RA']